protein="67" SHREC 2007-Protein Challenge

Participants

In this track we had two groups participating:
  • B. Li, Y. Fang, K. Ramani, D. Kihara (Purdue University, USA)
  • P. Daras, V. Tsatsaias (ITI, Greece)

  • The group from ITI participated with two different methods:
  • a three dimensional shape-structure comparison method (Trace) [5]
  • a graph based method (Graph) (not yet published)

  • Each group submitted a ranked list of the unknown 30 protein structures and Table together with the distance of each query to each protein from the 633 training set computed by their method. The SCOP classifiaction [1] was considered as the ground truth. Only the ATOM section of the PDB [2] files was provided.

    We also compared the results to the classification achieved by our method (LMB, Germany)[3]. Since we organized the track, our results are out of competition.

    Methods

    Li et al. focus on the topology of each protein: they use STRIDE [4] to detect the secondary structure, including the hydrogen bond. Then, they compute the beta sheets (beta strands connected with hydrogen bond) and the order. For main class a, b, c, d, g, and folds of a and g, they used the length and percentage of alpha helix and beta strand to classify. For each fold in each class b, c, d, they used the orders to classify.

    P. Daras and V. Tsatsaias submitted two ranked lists computed with two different methods. The first method (Trace) is described in the paper [5]. The second method (Graph) is called '3D Protein Classification Using Toplogical and Geometrical Information'. The 3D objects are firstly segmented to their molecular structure. Then, descriptors are extracted for each segment using spherical harmonics algorithms, and graphs are constructed for the molecules. Next, a sub-graph matching procedure is utilized in order to provide final similarity distances between the graphs.

    Evaluation

    The ranked lists were evaluated by the following simple method: The next neighbor in the ranked list, meaning the protein domain with the least distance to the query protein is considered and the query protein is assigned to its class. One point is scored for the correct SCOP class only, two points for the correct SCOP fold and zero points if neither of them is correct. The maximal amount of points is 60, when the fold for each query protein is correctly classified.

    From the three submitted methods , the team from Purdue performed (total score 45) best even though using simple features. The two methods submitted by team ITI misclassified half of the query proteins and their best method Graph scored 29 points. However, even better classification could be achieved by the LMB team, total score 52.

    The query set was chosen randomly from the 27 scop folds. Some proteins consisted of only one domain, others (e.g. Protein2, Protein8, Protein23) of several domains which were however all belonging to the same fold. Also, the size of the protein domains ranged from 31 amino acids (Protein11) to 364 amino acids (Protein16).

    References

    [1] A.G. Murzin., S.E Brenner, T. Hubbard and C. Chothia, SCOP: a structural classification of proteins database for the investigation of sequences and structures, J. Mol. Biol. 247, pp. 536-540, 1995.
    [2] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhata, H. Weissig, I.N. Shindyalov and P.E. Bourne, The Protein Data Bank, Nucleic Acids Research, Vol. 28, pp. 235-242, 2000.
    [3] M. Temerinac, M. Reisert and H. Burkhardt, Invariant Features for Searching in Protein Fold Databases, International Journal on Computer Mathematics , 'Special Issue on Bioinformatics', to appear 2007.
    [4] D. Frishman, P. Argos, Knowledge-Based Protein Secondary Structure Assignment, Proteins: Structure, Function, and Genetics 23:566-579, 1995 [5] P. Daras, D. Zarpalas, A. Axenopoulos, D. Tzovaras and M.G. Strintzis, Three-Dimensional Shape-Structure Comparison Method for Protein Classification, IEEE/ACM transactions on Computational Biology and Bioinformatics, Vol. 3, No. 3, pp. 193-207, July 2006.

    Group #(Wrong classification)=0 #(Correct SCOP Class only)=1 #(Correct SCOP Fold)=2 Total score
    Purdue 5 5 20 5*1+20*2=45
    ITI(Trace) 15 8 7 8*1+7*2=22
    ITI(Graph) 14 3 13 3*1+13*2=29
    LMB 2 4 24 4*1+24*2=52

    In the table below, for each group the predicted class/score for each query protein is presented according to the nearest neighbor.

    Query Protein ProteinID SCOP ID Purdue ITI(Trace) ITI(Graph) LMB
    Protein0 1agt g.3 b.40 / 0 g.3 / 2 g.3 / 2 g.3 / 2
    Protein1 1b0b a.1 a.1 / 2 c.2 / 0 a.1 / 2 a.1 / 2
    Protein2 1c6vA c.55 d.58 / 0 c.94 / 1 b.40 / 0 c.55 / 2
    Protein3 1cch a.3 a.3 / 2 d.169 / 0 a.3 / 2 a.3 / 2
    Protein4 1cor a.3 a.3 / 2 a.3 / 2 a.3 / 2 a.3 / 2
    Protein5 1dp4 c.93 c.69 / 1 c.37 / 1 b.40 / 0 c.69 / 1
    Protein6 1dyzA b.6 b.6 / 2 a.1 / 0 b.6 / 2 b.6 / 2
    Protein7 1e9m d.15 b.6 / 0 a.1 / 0 c.3 / 0 d.15 / 2
    Protein8 1eq2B c.2 c.2 / 2 c.93 / 1 a.3 / 0 c.2 / 2
    Protein9 1eylA b.42 b.42 / 2 b.60 / 1 c.2 / 0 b.42 / 2
    Protein10 1fe0A d.58 a.4 / 0 a.39 / 0 d.58 / 2 d.58 / 2
    Protein11 1g26 g.3 g.3 / 2 a.26 / 0 b.34 / 0 g.3 / 2
    Protein12 1gcpA b.34 b.40 / 1 b.1 / 1 b.34 / 2 b.34 / 2
    Protein13 1gglA b.60 b.60 / 2 a.3 / 0 b.60 / 2 b.60 / 2
    Protein14 1gqzA d.19 d.58 / 1 c.1 / 0 c.2 / 0 c.2 / 0
    Protein15 1gyvA b.1 b.47 / 1 b.1 / 2 c.93 / 0 b.7 / 1
    Protein16 1icp c.1 c.1 / 2 a.24 / 0 a.4 / 0 c.1 / 2
    Protein17 1ihmA b.121 b.121 / 2 c.3 / 0 d.58 / 0 b.121 / 2
    Protein18 1il6 a.26 a.26 / 2 b.6 / 0 c.69 / 0 a.26 / 2
    Protein19 1jjf c.69 c.69 / 2 c.23 / 1 c.37 / 1 c.37 / 1
    Protein20 1jr6 c.37 d.15 / 0 b.6 / 0 c.2 / 1 c.23 / 1
    Protein21 1jzmA a.1 a.1 / 2 a.1 / 2 a.1 / 2 a.1 / 2
    Protein22 1kt7 b.60 b.60 / 2 b.40 / 1 b.60 / 2 b.60 / 2
    Protein23 1mi3 a.24 c.1 / 2 c.47 / 1 c.1 / 2 c.1 / 2
    Protein24 1mi3A c.1 c.1 / 2 c.1 / 2 c.23 / 1 c.1 / 2
    Protein25 1pruA a.4 a.4 / 2 g.3 / 0 b.1 / 0 a.4 / 2
    Protein26 1rfjA a.39 a.39 / 2 a.39 / 2 a.39 / 2 a.39 / 2
    Protein27 1vavA b.29 b.47 / 1 c.69 / 0 d.15 / 0 b.29 / 2
    Protein28 1wat a.24 a.24 / 2 c.37 / 0 c.37 / 0 d.58 / 0
    Protein29 1xnc b.29 b.29 / 2 b.29 / 2 b.29 / 2 b.29 / 2
    Total: 45/ 60 22/ 60 29/ 60 52/ 60

    Click on the PDB id of the Query protein and you will get a ranked list computed with the method specified in the brackets. The nearest neighbour to the query protein for each method is shown in green.

    Query Protein ProteinID SCOP ID Results(Perdue) Results(ITI(Trace)) Results(ITI (Graph)) Results(LMB)
    Protein0 1agt g.3 1agt 1mjc 1agt 1tsk 1agt 1ktx 1agt 2crd
    Protein1 1b0b a.1 1b0b 1flp 1b0b 1bdma1 1b0b 1eca 1b0b 1flp
    Protein2 1c6vA c.55 1c6vA 1npk 1c6vA 1omp 1c6vA 1igp 1c6vA 1itg
    Protein3 1cch a.3 1cch 1dvh 1cch 1prtc2 1cch 351c 1cch 1cor
    Protein4 1cor a.3 1cor 1cyi 1cor 2pac 1cor 351c 1cor 2pac
    Protein5 1dp4 c.93 1dp4 1crl 1dp4 1mmd_2 1dp4 1kab 1dp4 1pea
    Protein6 1dyzA b.6 1dyzA 1aiza 1dyzA 1gdlo1 1dyzA 1aiza 1dyzA 2aza
    Protein7 1e9m d.15 1e9m 1jer 1e9m 1aofa1 1e9m 1coy_1 1e9m 1put
    Protein8 1eq2B c.2 1eq2B 1hrda1 1eq2B 2dri 1eq2B 351c 1eq2B 1xel
    Protein9 1eylA b.42 1eylA 1wba 1eylA 1mup 1eylA 1cyda 1eylA 1tie
    Protein10 1fe0A d.19 1fe0A 1cgpa1 1fe0A 4icb 1fe0A 1afi 1fe0A 1fwp
    Protein11 1g26 g.3 1g26 1chl 1g26 1rfba 1g26 1mmd_1 1g26 1gur
    Protein12 1gcpA b.34 1gcpA 1rip 1gcpA 1yaia 1gcpA 1shfa 1gcpA 1shfa
    Protein13 1gglA b.60 1gglA 1hms 1gglA 1ccr 1gglA 1hms 1gglA 1hms
    Protein14 1gqzA d.19 1gqzA 1vaoa1 1gqzA 1edt 1gqzA 1cyda 1gqzA 1scu
    Protein15 1gyvA b.1 1gyvA 1sgc 1gyvA 1tnn 1gyvA 2lbp 1gyvA 1rsy
    Protein16 1icp c.1 1icp 1jdc_2 1icp 1was 1icp 1aplc 1icp 1oyb
    Protein17 1ihmA b.121 1ihmA 2bpa1 1ihmA 1gnd_1 1ihmA 1pil 1ihmA 2tbv
    Protein18 1il6 a.26 1il6 1huw 1il6 1aiza 1il6 1wht.1 1il6 1ifa
    Protein19 1jjf c.69 1jjf 3tgl 1jjf 1scua2 1jjf 1deka 1jjf 1dar_2
    Protein20 1jr6 c.37 1jr6 1se4_2 1jr6 1jer 1jr6 2cmd_1 1jr6 1ntr
    Protein21 1jzmA a.1 1jzmA 3sdha 1jzmA 3sdha 1jzmA 3sdha 1jzmA 3sdha
    Protein22 1kt7 b.60 1kt7 1hbq 1kt7 2prd 1kt7 1hbq 1kt7 1hbp
    Protein23 1mi3 c.3 1mi3 5ruba1 1mi3 2trcp 1mi3 1nal1 1mi3 1ads
    Protein24 1mi3A c.3 1mi3A 5ruba1 1mi3A 3rubl1 1mi3A 5nul 1mi3A 1ads
    Protein25 1pruA a.4 1pruA 1yrna 1pruA 4cpai 1pruA 2hft_2 1pruA 1oct
    Protein26 1rfjA a.39 1rfjA 1osa 1rfjA 1osa 1rfjA 1ctaa 1rfjA 1osa
    Protein27 1vavA b.29 1vavA 1agja 1vavA 1tib 1vavA 1pga 1vavA 1kit_1
    Protein28 1wat a.24 1wat 1was 1wat 1deka 1wat 1dar_2 1wat 1ab8
    Protein29 1xnc b.29 1xnc 1xnb 1xnc 1xnb 1xnc 1xnb 1xnc 1xnb

     

     


    This Homepage was created by Maja Temerinac for the SHREC 2007 Protein Challenge.
    For more information, please contact:
    Maja Temerinac temerina(at)informatik.uni-freiburg.de or
    Marco Reisert reisert(at)informatik.uni-freiburg.de