A machine learning based method for the prediction of secretory proteins using amino acid composition,their order and similarity-search

Garg, Aarti ; Raghava, Gajendra P. S. (2008) A machine learning based method for the prediction of secretory proteins using amino acid composition,their order and similarity-search In Silico Biology, 8 (2). pp. 129-140. ISSN 1386-6338

[img]
Preview
PDF - Publisher Version
230kB

Official URL: http://iospress.metapress.com/content/2l710lkp5444...

Abstract

Most of the prediction methods for secretory proteins require the presence of a correct N-terminal end of the pre-protein for correct classification. As large scale genome sequencing projects sometimes assign the 5'-end of genes incorrectly, many proteins are encoded without the correct N-terminus leading to incorrect prediction. In this study, a systematic attempt has been made to predict secretory proteins irrespective of presence or absence of N-terminal signal peptides (also known as classical and non-classical secreted proteins respectively), using machine-learning techniques; artificial neural network (ANN) and support vector machine (SVM). We trained and tested our methods on a dataset of 3321 secretory and 3654 non-secretory mammalian proteins using five-fold cross-validation technique. First, ANN-based modules have been developed for predicting secretory proteins using 33 physico-chemical properties, amino acid composition and dipeptide composition and achieved accuracies of 73.1%, 76.1% and 77.1%, respectively. Similarly, SVM-based modules using 33 physico-chemical properties, amino acid, and dipeptide composition have been able to achieve accuracies of 77.4%, 79.4% and 79.9%, respectively. In addition, BLAST and PSI-BLAST modules designed for predicting secretory proteins based on similarity search achieved 23.4% and 26.9% accuracy, respectively. Finally, we developed a hybrid-approach by integrating amino acid and dipeptide composition based SVM modules and PSI-BLAST module that increased the accuracy to 83.2%, which is significantly better than individual modules. We also achieved high sensitivity of 60.4% with low value of 5% false positive predictions using hybrid module. A web server SRTpred has been developed based on above study for predicting classical and non-classical secreted proteins from whole sequence of mammalian proteins, which is available from http://www.imtech.res.in/raghava/srtpred/.

Item Type:Article
Source:Copyright of this article belongs to IOS Press.
Keywords:Classical Pathway; Non-classical Pathway; Secretory Proteins; Prediction; SRTpred; Redundancy; Dataset Size; ANN; SVM; Blast; PSI-BLAST; N-terminal Sequence
ID Code:43066
Deposited On:09 Jun 2011 11:42
Last Modified:18 May 2016 00:10

Repository Staff Only: item control page