Skip to main content
Log in

Finding identical sequence repeats in multiple protein sequences: An algorithm

  • Published:
Journal of Biosciences Aims and scope Submit manuscript

Abstract

In recent years, several experimental evidences suggest that amino acid repeats are closely linked to many disease conditions, as they have a significant role in evolution of disordered regions of the polypeptide segments. Even though many algorithms and databases were developed for such analysis, each algorithm has some caveats, like limitation on the number of amino acids within the repeat patterns and number of query protein sequences. To this end, in the present work, a new method called the internal sequence repeats across multiple protein sequences (ISRMPS) is proposed for the first time to identify identical repeats across multiple protein sequences. It also identifies distantly located repeat patterns in various protein sequences. Our method can be applied to study evolutionary relationships, epitope mapping, CRISPR-Cas sequencing methods, and other comparative analytical assessments of protein sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

References

  • Abraham A-L, Rocha EPC and Pothier J 2008 Swelfe: a detector of internal repeats in sequences and structures. Bioinformatics 24 1536–1537

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Altschul SF, Gish W, Miller W, et al. 1990 Basic local alignment search tool. J. Mol. Biol. 215 403–410

    Article  CAS  PubMed  Google Scholar 

  • Babu V, Uthayakumar M, Kirti Vaishnavi M, et al. 2011 RPS: Repeats in protein sequences. J. Appl. Crystallogr. 44 647–650

    Article  ADS  CAS  Google Scholar 

  • Gruber M, Söding J and Lupas AN 2005 REPPER—repeats and their periodicities in fibrous proteins. Nucleic Acids Res. 33 W239–W243

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Heger A and Holm L 2000 Rapid automatic detection and alignment of repeats in protein sequences. Proteins 41 224–237

    Article  CAS  PubMed  Google Scholar 

  • Karp R and Rabin MO 1987 Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31 249–260

    Article  MathSciNet  Google Scholar 

  • Klein C and Westenberger A 2012 Genetics of Parkinson’s disease. Cold Spring Harb. Perspect. Med. 2 a008888

    Article  PubMed  PubMed Central  Google Scholar 

  • Kohany O, Gentles AJ, Hankus L, et al. 2006 Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinform. 7 474

    Article  Google Scholar 

  • Luo H and Nijveen H 2014 Understanding and identifying amino acid repeats. Brief. Bioinform. 15 582–591

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mansour A 2008 ClustalW©: Widespread Multiple sequences alignments program. J. Cell Mol. 7 81–82

    Google Scholar 

  • Marcotte EM, Pellegrini M, Yeates TO, et al. 1999 A census of protein repeats. J. Mol. Biol. 293 151–160

    Article  CAS  PubMed  Google Scholar 

  • Meena LS 2015 An overview to understand the role of PE _ PGRS family proteins in Mycobacterium tuberculosis H 37 R v and their potential as new drug targets. Biotechnol. Appl. Biochem. 62 145–153

    Article  CAS  PubMed  Google Scholar 

  • Michael D, Gurusaran M, Santhosh R, et al. 2019 RepEx: A web server to extract sequence repeats from protein and DNA sequences. Comput. Biol. Chem. 78 424–430

    Article  CAS  PubMed  Google Scholar 

  • Nirjhar B, Chidambarathanu N, Daliah M, et al. 2008 An Algorithm to find all identical internal sequence repeats. Curr. Sci. 95 188–195

    Google Scholar 

  • Rajathei DM, Parthasarathy S and Selvaraj S 2019 Identification and analysis of long repeats of proteins at the domain level. Front. Bioeng. Biotechnol. 7 250

    Article  PubMed  PubMed Central  Google Scholar 

  • Senthilkumar R, Sabarinathan R, Hameed BS, et al. 2010 FAIR: a server for internal sequence repeats. Bioinformation 4 271–275

    Article  PubMed  PubMed Central  Google Scholar 

  • Szklarczyk R and Heringa J 2004 Tracking repeats using significance and transitivity. Bioinformatics 20 (Suppl 1) i311–i317

    Article  CAS  PubMed  Google Scholar 

  • Tanabe K, Arisue N, Palacpac NM, et al. 2012 Geographic differentiation of polymorphism in the Plasmodium falciparum malaria vaccine candidate gene SERA5. Vaccine 30 1583–1593

    Article  CAS  PubMed  Google Scholar 

  • Thompson JD, Higgins DG and Gibson TJ 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22 4673–4680

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ukkonen E 1995 On-line construction of suffix trees. Algorithmica 14 249–260

    Article  MathSciNet  Google Scholar 

  • Uthayakumar M, Benazir B, Patra S, et al. 2012 Homepeptide repeats: implications for protein structure, function and evolution. Genom. Proteom. Bioinform. 10 217–225

    Article  CAS  Google Scholar 

  • Vetting MW, Hegde SS, Fajardo JE, et al. 2006 Pentapeptide repeat proteins. Biochemistry 45 1–10

    Article  CAS  PubMed  Google Scholar 

  • Worsfold P, Townshend A, Poole CF, et al. 2019 Encyclopedia of analytical science, 3rd edition (Elsevier)

Download references

Acknowledgements

KS thanks the ICMR for funding the project ‘Do protein sequence repeats play a role in biological process and disease conditions’ (ISRM/12(34)/2020). KS, RS and DR thank the Center for Development of Advanced Computing (CDAC) for funding the project ‘An Indian Initiative on setting up a high-fidelity structural data archival/retrieval system for Life Sciences-(PDBi)’. RS thanks the Department of Science and Technology-Science and Engineering Research Board (DST-SERB), New Delhi, India, for providing research grant and postdoctoral fellowship (PDF/2019/000254). AM thanks Dr D S Kothari Postdoctoral Fellowship (BL/18-19/0320), funded by the University Grants Commission (UGC). All the authors thank the Department of Computational and Data Sciences, Indian Institute of Science, Bengaluru, India, for providing the necessary support.

Author information

Authors and Affiliations

Authors

Contributions

Prof. SK conceptualized the study. VKM and RS devised the methodology. AM, MS, RS and DR assessed the algorithm, methods. CNR and AM contributed to case studies. AM and RR wrote the manuscript. MS and CNR reviewed the manuscript.

Corresponding author

Correspondence to Sekar Kanagaraj.

Ethics declarations

Conflict of interest

The authors acknowledge that there is no conflict of interest related to financial and research interest related to this manuscript.

Additional information

Corresponding editor: Deepesh Nagarajan

Supplementary Information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maurya, V.K., Sanjeevi, M., Rahul, C.N. et al. Finding identical sequence repeats in multiple protein sequences: An algorithm. J Biosci 49, 41 (2024). https://doi.org/10.1007/s12038-023-00410-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12038-023-00410-x

Keywords

Navigation