Use of 2D FFT and DTW in Protein Sequence Comparison

Pal, Jayanta; Ghosh, Soumen; Maji, Bansibadan; Bhattacharya, Dilip Kumar

doi:10.1007/s10930-023-10160-2

Use of 2D FFT and DTW in Protein Sequence Comparison

Published: 17 October 2023

Volume 43, pages 1–11, (2024)
Cite this article

The Protein Journal Aims and scope Submit manuscript

Jayanta Pal^1,2,
Soumen Ghosh¹,
Bansibadan Maji¹ &
…
Dilip Kumar Bhattacharya³

143 Accesses
Explore all metrics

Abstract

Protein sequence comparison remains a challenging work for the researchers owing to the computational complexity due to the presence of 20 amino acids compared with only four nucleotides in Genome sequences. Further, protein sequences of different species are of different lengths; it throws additional changes to the researchers to develop methods, specially alignment-free methods, to compare protein sequences. In this work, an efficient technique to compare protein sequences is developed by a graphical representation. First, the classified grouping of 20 amino acids with a cardinality of 4 based on polar class is considered to narrow down the representational range from 20 to 4. Then a unit vector technique based on a two-quadrant Cartesian system is proposed to provide a new two-dimensional graphical representation of the protein sequence. Now, two approaches are proposed to cope with the varying lengths of protein sequences from various species: one uses Dynamic Time Warping (DTW), while the other one uses a two-dimensional Fast Fourier Transform (2D FFT). Next, the effectiveness of these two techniques is analyzed using two evaluation criteria—quantitative measures based on symmetric distance (SD) and computational speed. An analysis is performed on five data sets of 9 ND4, 9 ND5, 9 ND6, 12 Baculovirus, and 24 TF proteins under the two methods. It is found that the FFT-based method produces the same results as DTW but in less computational time. It is found that the result of the proposed method agrees with the known biological reference. Further, the present method produces better clustering than the existing ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Protein Sequence Comparison Based on Physicochemical Properties and the Position-Feature Energy Matrix

Article Open access 10 April 2017

Comparative Studies Based on a 3-D Graphical Representation of Protein Sequences

An accurate alignment-free protein sequence comparator based on physicochemical properties of amino acids

Article Open access 01 July 2022

References

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. https://doi.org/10.1016/S0022-2836(05)80360-2
Article CAS PubMed Google Scholar
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22):4673–4680. https://doi.org/10.1093/nar/22.22.4673
Article CAS PubMed PubMed Central Google Scholar
Zielezinski A, Vinga S, Almeida J, Karlowski WM (2017) Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol 18:186. https://doi.org/10.1186/s13059-017-1319-7
Article PubMed PubMed Central Google Scholar
Hamori E, Ruskin J (1983) H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. J Biol Chem 258(2):1318–1327. https://doi.org/10.1016/S0021-9258(18)33196-X
Article CAS PubMed Google Scholar
Gates MA (1986) A simple way to look at DNA. J Theor Biol 119(3):319–328
Article ADS CAS PubMed Google Scholar
Jeffrey HJ (1990) Chaos game representation of gene structure. Nucleic Acids Res 18(8):2163–2170. https://doi.org/10.1093/nar/18.8.2163
Article CAS PubMed PubMed Central Google Scholar
Nandy A (1994) A new graphical representation and analysis of DNA sequence structure: I. Methodology and application to globin genes. Curr Sci 66:309–314
CAS Google Scholar
Leong PM, Morgenthaler S (1995) Random walk and gap plots of DNA sequences. Bioinformatics 11(5):503–507
Article CAS Google Scholar
Hoang T, Yin C, Yau S-T (2016) Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison. Genomics 108(3–4):134–142. https://doi.org/10.1016/j.ygeno.2016.08.002
Article CAS PubMed Google Scholar
Jin X, Jiang Q, Chen Y et al (2017) Similarity/dissimilarity calculation methods of DNA sequences: a survey. J Mol Graph Model 76:342–355. https://doi.org/10.1016/j.jmgm.2017.07.019
Article CAS PubMed Google Scholar
Abd Elwahaab MA, Abo-Elkhier MM, Abo el Maaty MI (2019) A statistical similarity/dissimilarity analysis of protein sequences based on a novel group representative vector. Biomed Res Int 2019:1–9. https://doi.org/10.1155/2019/8702968
Article CAS Google Scholar
He P-A, Xu S, Dai, Q.i., Yao, Y. (2016) A generalization of CGR representation for analyzing and comparing protein sequences. Int J Quantum Chem 116(6):476–482. https://doi.org/10.1002/qua.25068
Article CAS Google Scholar
Hu H, Li Z, Dong H, Zhou T (2017) Graphical representation and similarity analysis of protein sequences based on fractal interpolation. IEEE/ACM Trans Comput Biol Bioinform 14(1):182–192. https://doi.org/10.1109/TCBB.2015.2511731
Article PubMed Google Scholar
Li C, Li X, Lin YX (2016) Numerical characterization of protein sequences based on the generalized Chou’s pseudo amino acid composition. Appl Sci 6:406. https://doi.org/10.3390/app6120406
Article CAS Google Scholar
Ma T, Liu Y, Dai Q, Yao Y, He PA (2014) A graphical representation of protein based on a novel iterated function system. Physics A 403:21–28. https://doi.org/10.1016/j.physa.2014.01.067
Article ADS Google Scholar
Mervat MA, Marwa AA, Moheb IA, Jiangke Y (2019) Measuring similarity among protein sequences using a new descriptor. Biomed Res Int 22:2796971. https://doi.org/10.1155/2019/2796971
Article CAS Google Scholar
Mu Z, Yu T, Liu X, Zheng H, Wei L, Liu J (2021) FEGS: a novel feature extraction model for protein sequences and its applications. BMC Bioinform 22:1–5
Article Google Scholar
Wu C, Gao R, De Marinis Y, Zhang Y (2018) A novel model for protein sequence similarity analysis based on spectral radius. J Theor Biol 446:61–70. https://doi.org/10.1016/j.jtbi.2018.03.001
Article ADS CAS PubMed PubMed Central Google Scholar
Yao Y-H, Dai Q, Li C, He P-A, Nan X-Y, Zhang Y-Z (2008) Analysis of similarity/dissimilarity of protein sequences. Proteins 73(4):864–871. https://doi.org/10.1002/prot.22110
Article CAS PubMed Google Scholar
Yao YH, Yan S, Han J, Dai Q, He PA (2014) A novel descriptor of protein sequences and its application. J Theor Biol 347:109–117. https://doi.org/10.1016/j.jtbi.2014.01.001
Article ADS CAS PubMed Google Scholar
Zhang Y, Ruan J, He PA (2013) Analyzes of the similarities of protein sequences based on the pseudo amino acid composition. Chem Phys Lett 590:239–244. https://doi.org/10.1016/j.cplett.2013.10.076
Article ADS CAS Google Scholar
Lochel HF, Eger D, Sperlea T, Heider D (2020) Deep learning on chaos game representation for proteins. Bioinformatics 36:272–279. https://doi.org/10.1093/bioinformatics/btz493
Article CAS PubMed Google Scholar
Li C, Dai Q, He PA (2022) A time series representation of protein sequences for similarity comparison. J Theor Biol 538:111039. https://doi.org/10.1016/j.jtbi.2022.111039
Article CAS PubMed Google Scholar
Akbar S, Hayat M, Tahir M, Chong KT (2020) cACP-2LFS: classification of anticancer peptides using sequential discriminative model of KSAAP and two-level feature selection approach. IEEE Access 8:131939–131948
Article Google Scholar
Akbar S, Hayat M, Iqbal M, Jan MA (2017) iACP-GAEnsC: evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space. Artif Intell Med 79:62–70
Article PubMed Google Scholar
Ahmad A, Akbar S, Khan S, Hayat M, Ali F, Ahmed A, Tahir M (2021) Deep-AntiFP: prediction of antifungal peptides using distanct multi-informative features incorporating with deep neural networks. Chemom Intell Lab Syst 208:104214
Article CAS Google Scholar
Ahmad A, Akbar S, Tahir M, Hayat M, Ali F (2022) iAFPs-EnC-GA: identifying antifungal peptides using sequential and evolutionary descriptors based multi-information fusion and ensemble learning approach. Chemom Intell Lab Syst 222:104516
Article CAS Google Scholar
Akbar S, Ahmad A, Hayat M, Rehman AU, Khan S, Ali F (2021) iAtbP-Hyb-EnC: prediction of antitubercular peptides via heterogeneous feature representation and genetic algorithm based ensemble learning model. Comput Biol Med 137:104778
Article PubMed Google Scholar
Sakoe H, Chiba S (1978) Dynamic-programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26:43–49
Article Google Scholar
Gold O, Sharir M (2018) Dynamic time warping and geometric edit distance: breaking the quadratic barrier. ACM Trans Algorithms (TALG) 14(4):1–17
Article MathSciNet Google Scholar
Zhang Y, Yu X (2010) Analysis of protein sequence similarity. In: 2010 IEEE fifth international conference on bio-inspired computing: theories and applications (BIC-TA), pp 1255–1258. IEEE.
Pal J, Ghosh S, Maji B, Bhattacharya DK (2022) Mathematical approach to protein sequence comparison based on physiochemical properties. ACS Omega 7(43):39446–39455
Article CAS PubMed PubMed Central Google Scholar
Pal J, Ghosh S, Maji B, Bhattacharya DK (2018) Protein sequence comparison under a new complex representation of amino acids based on their physio-chemical properties. Int J Eng Technol 7:181–184
Article CAS Google Scholar
Oppenheim AV, Buck JR, Schafer RW (2001) Discrete-time signal processing, vol 2. Prentice Hall, Upper Saddle River
Google Scholar
Cooley JW, Tukey OW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19:297–301
Article MathSciNet Google Scholar
Yu ZG, Anh V, Lau KS (2004) Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses. J Theor Biol 226(3):341–348
Article ADS MathSciNet CAS PubMed Google Scholar
Yau SST, Wang J, Niknejad A, Lu C, Jin N, Ho YK (2003) DNA sequence representation without degeneracy. Nucleic Acids Res 31(12):3078–3080
Article CAS PubMed PubMed Central Google Scholar
Tamura K, Stecher G, Kumar S (2021) MEGA11: molecular evolutionary genetics analysis version 11. Mol Biol Evol 38(7):3022–3027
Article CAS PubMed PubMed Central Google Scholar
King BR, Aburdene M, Thompson A, Warres Z (2014) Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity. EURASIP J Bioinform Syst Biol 2014(1):1–12
Article CAS Google Scholar
Aamir KM, Maud MA, Loan A (2005) On Cooley-Tukey FFT method for zero padded signals. In: Proceedings of the IEEE symposium on emerging technologies, 2005 (pp 41–45). IEEE.
Felsenstein J (2004) PHYLIP (phylogeny inference package) version 3.6. Distributed by the author. http://www.evolution.gs.washington.edu/phylip.Html.
Yao YH, Kong F, Dai Q, He PA (2013) A sequence-segmented method applied to the similarity analysis of long protein sequence. Commun Math Comput Chem 70(1):431–450
MathSciNet CAS Google Scholar
Yao Y, Yan S, Xu H, Han J, Nan X, He PA, Dai Q (2014) Similarity/dissimilarity analysis of protein sequences based on a new spectrum-like graphical representation. Evol Bioinform 10:EBO-S14713
Article Google Scholar
Yu L, Zhang Y, Gutman I, Shi Y, Dehmer M (2017) Protein sequence comparison based on physicochemical properties and the position-feature energy matrix. Sci Rep 7(1):1–9
Google Scholar

Download references

Funding

We wish to confirm that there has been no financial support for this work that could have influenced its outcome.

Author information

Authors and Affiliations

Department of ECE, National Institute of Technology, Durgapur, India
Jayanta Pal, Soumen Ghosh & Bansibadan Maji
Department of CSE, Narula Institute of Technology, Kolkata, India
Jayanta Pal
Department of Pure Mathematics, Calcutta University, Kolkata, India
Dilip Kumar Bhattacharya

Authors

Jayanta Pal
View author publications
You can also search for this author in PubMed Google Scholar
Soumen Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Bansibadan Maji
View author publications
You can also search for this author in PubMed Google Scholar
Dilip Kumar Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by JP, SG, BM. The first draft of the manuscript was written by DKB and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jayanta Pal.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Consent to Participate

Not applicable.

Consent to Publish

Not applicable.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 1271 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pal, J., Ghosh, S., Maji, B. et al. Use of 2D FFT and DTW in Protein Sequence Comparison. Protein J 43, 1–11 (2024). https://doi.org/10.1007/s10930-023-10160-2

Download citation

Accepted: 20 September 2023
Published: 17 October 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s10930-023-10160-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Use of 2D FFT and DTW in Protein Sequence Comparison

Abstract

Access this article

Similar content being viewed by others

Protein Sequence Comparison Based on Physicochemical Properties and the Position-Feature Energy Matrix

Comparative Studies Based on a 3-D Graphical Representation of Protein Sequences

An accurate alignment-free protein sequence comparator based on physicochemical properties of amino acids

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to Participate

Consent to Publish

Ethical Approval

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 1271 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Use of 2D FFT and DTW in Protein Sequence Comparison

Abstract

Access this article

Similar content being viewed by others

Protein Sequence Comparison Based on Physicochemical Properties and the Position-Feature Energy Matrix

Comparative Studies Based on a 3-D Graphical Representation of Protein Sequences

An accurate alignment-free protein sequence comparator based on physicochemical properties of amino acids

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to Participate

Consent to Publish

Ethical Approval

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 1271 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation