A novel complexity measure for comparative analysis of protein sequences from complete genomes

Nandi, Tannistha ; Dash, Debasis ; Ghai, Rohit ; Rao, Chandrika B. ; Kannan, K. ; Brahmachari, Samir K. ; Ramakrishnan, C. ; Ramachandran, Srinivasan (2003) A novel complexity measure for comparative analysis of protein sequences from complete genomes Journal of Biomolecular Structure & Dynamics, 205 (5). pp. 657-668. ISSN 0739-1102

Full text not available from this repository.

Official URL: http://www.jbsdonline.com/c3013/c4104/A-Novel-Comp...

Abstract

Analysis of sequence complexities of proteins is an important step in the characterization and classification of new genomes. A new measure has been proposed to compute sequence complexity in protein sequences based on linguistic complexity. The algorithm requires a single parameter, is computationally simple and provides a framework for comparative genomic analysis. Protein sequences were classified into groups of high or low complexity based on a quantitative measure termed Fc, which is proportional to the fraction of low complexity sequence present in the protein. The algorithm was tested on sequences of 196 non-homologous proteins whose crystal structures are available at ≤2.0 Å resolution. Protein sequences of high complexity had 'globular' structures (95% agreement), whereas those of low complexity had non-globular structures (80% agreement). Application of this measure to proteins of unknown structure/function from different genomes revealed that the sequences of high complexity constitute the majority in all genomes (about 90% in Archaea, about 93% in Eubacteria, 89% in Saccharomyces cerevisiae and 90% in Caenorhabditis elegans). Aeropyrum pernix among Archaeae and Deinococcus radiodurans among Eubacteria have the lowest fraction of high complexity proteins (75% and 80% respectively). Further, it was observed that a few bacterial pathogens (Mycobacterium tuberculosis, Pseudomonas aeruginosa) have high fraction of low complexity proteins. The program ScanCom is available from the authors as a PERL script (UNIX system).

Item Type:Article
Source:Copyright of this article belongs to Adenine Press.
ID Code:63000
Deposited On:24 Sep 2011 15:09
Last Modified:13 Jul 2012 13:32

Repository Staff Only: item control page