CLUSS Clustering of protein sequences based on SMS a new similarity measure | HOME |
|||||
|
|
|
||||||||||||||||||||||||
IntroductionThe rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. Abdellali Kelil, Shengrui Wang, Ryszard Brzezinski, Alain Fleury. CLUSS: Clustering of protein sequences based on a new similarity measure. BMC Bioinformatics 2007, 8:286. PDF Abdellali Kelil, Shengrui Wang, Ryszard Brzezinski. Clustering of Non-Alignable Protein Sequences. BIOKDD'07, 12 August 2007, San Jose, CA, USA. PDF Abdellali Kelil, Shengrui Wang, Ryszard Brzezinski. A New Alignment-Independent Algorithm for Clustering Protein Sequences. IEEE BIBE'07, October 2007, Harvard Medical Schoole, Boston, Mssachusetts, USA. PDF Abdellali Kelil, Shengrui Wang, Ryszard
Brzezinski. CLUSS2: An Alignment-Independent Algorithm for Clustering Protein Families with Multiple Biological Functions.
IJCBDD 2008, vol. 1, no. 2, pp. 122-140.
PDF |