Conference proceeding
Classification comparison of prediction of solvent accessibility from protein sequences
ACM International Conference Proceeding Series; Vol. 55: Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29 : Dunedin, New Zealand, pp.333-338
01 Jan 2004
Abstract
The prediction of residue solvent accessibility from protein sequences has been studied by various methods. The direct comparison of these methods is impossible due to the variety of datasets used and the difference in structure definition. In this paper we choose 5 classification approaches (decision tree (DT), Support Vector Machine (SVM), Bayesian Statistics (BS), Neural Network (NN) and Multiple Linear Regression (MLR)) for predicting solvent accessibility based on the same dataset and using the same structure definition so that we can directly compare different methods. We evaluate these methods in a cross-validation test on 2148 unique proteins using single sequences and multiple sequences approaches with a cutoff of 20% for two-state definition of solvent accessibility. According to the experiment results, SVM and NN are both the best predictors with accuracy 79%, correlation coefficient 0.59, 2~4% superior to other three methods on multiple sequences prediction. A further test result on a blind test set from Critical Assessment of Techniques for Protein Structure Prediction experiment (CASP5) is consistent with this result. On single sequence prediction, DT, BS and MLR perform about the same at 71~72% with correlation coefficient 0.43. The improvement over the baseline model that use only the identity of target residue is small. Local sequence seems embed very little information on accessibility. Separate training according to protein size improves the prediction when there are sufficiently large dataset available. The consensus prediction combining the 5 approaches is not significantly better than the best single method.
Metrics
Details
- Title
- Classification comparison of prediction of solvent accessibility from protein sequences
- Creators
- Huiling ChenHuan-Xiang ZhouXiaohua HuIllhoi Yoo
- Publication Details
- ACM International Conference Proceeding Series; Vol. 55: Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29 : Dunedin, New Zealand, pp.333-338
- Conference
- 2nd Conference on Asia-Pacific Bioinformatics, 2nd (Dunedin, New Zealand)
- Number of pages
- 1
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Information Science (Informatics)
- Identifiers
- 991019189122004721