CLUSS Clustering of protein sequences based on SMS a new similarity measure



Detecting and Aligning Related Protein Sequences here.


About us

- University of Sherbrooke Newspaper, May 1st 2008

- University of Sherbrooke Newspaper, February 26th 2009

- The 7th International Conference on BioInformatics and BioEngineering (BIBE 2007)

  People using CLUSS

- Food and Agriculture Organization of the United Nations, Rome 2009

- Santa Fe Institute, Department of Biology, Stanford University, USA 2009

- Environmental Sciences Division, Oak Ridge National Laboratory, USA 2009

Others ...


CLUSS was developed within the framework of the data mining research tasks of the ProspectUs laboratory.



Last update: 22 June 2011




The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families.

Abdellali Kelil, Shengrui  Wang, Ryszard Brzezinski, Alain Fleury. CLUSS: Clustering of protein sequences based on a new similarity measure. BMC Bioinformatics 2007, 8:286. PDF

Abdellali Kelil, Shengrui  Wang, Ryszard Brzezinski. Clustering of Non-Alignable Protein Sequences. BIOKDD'07, 12 August 2007, San Jose, CA, USA. PDF

Abdellali Kelil, Shengrui  Wang, Ryszard Brzezinski. A New Alignment-Independent Algorithm for Clustering Protein Sequences. IEEE BIBE'07, October 2007, Harvard Medical Schoole, Boston, Mssachusetts, USA. PDF

Abdellali Kelil, Shengrui  Wang, Ryszard Brzezinski. CLUSS2: An Alignment-Independent Algorithm for Clustering Protein Families with Multiple Biological Functions. IJCBDD 2008, vol. 1, no. 2, pp. 122-140. PDF