Integrating semantic similarity with Dirichlet multinomial mixture model for enhanced web service clustering

Agarwal, Neha; Sikka, Geeta; Awasthi, Lalit Kumar

doi:10.1007/s10115-023-02034-x

Integrating semantic similarity with Dirichlet multinomial mixture model for enhanced web service clustering

Regular Paper
Published: 22 December 2023

Volume 66, pages 2327–2353, (2024)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Neha Agarwal¹^na1,
Geeta Sikka²^na1 &
Lalit Kumar Awasthi³^na1

142 Accesses
Explore all metrics

Abstract

With accelerated advancement of web 2.0, developers generally describe the functionality of services in short natural text. Keyword-based searching techniques are not an efficient way of discovering services from repositories. It suffers from vocabulary problems. Latent Dirichlet allocation (LDA) with word embedding techniques is widely adopted for efficiently extracting latent features from the service descriptions. However, LDA is not efficient on short text due to limited content and inadequate occurring words. The word vectors generated by word embedding techniques are of finer quality than topic modeling techniques. Gibbs sampling algorithm for Dirichlet multinomial mixture (GSDMM) model gives better results on web service description documents because it provides one topic corresponding to one document. In this paper, we evaluate the performance of GSDMM model with word embeddings and propose WV+GSDMMK model. The proposed model improves service-to-topic mapping by determining semantic similarity among features. K-means clustering is applied on service to topic representation. Results are evaluated on five real-time datasets based on intrinsic and extrinsic evaluation measures. Experimental results demonstrate that the proposed method outperforms other baseline techniques, and the accuracy score is also increased by 5%, 18%, 3%, 4%, and 6% on datasets DS1, DS2, DS3, DS4, and DS5, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on Web Service Clustering Method Based on Word Embedding and Topic Model

Gaussian LDA and Word Embedding for Semantic Sparse Web Service Discovery

PICF-LDA: a topic enhanced LDA with probability incremental correction factor for Web API service clustering

Article Open access 18 July 2022

Notes

References

Obidallah WJ, Raahemi B, Ruhi U (2020) Clustering and association rules for web service discovery and recommendation: a systematic literature review. SN Comput Sci 1(1):27. https://doi.org/10.1007/s42979-019-0026-8
Article Google Scholar
Shi M, Liu J, Zhou D, Tang M, Cao B (2017) We-lda: a word embeddings augmented lda model for web services clustering. In: 2017 IEEE international conference on web services (icws). IEEE, pp 9–16. https://doi.org/10.1109/ICWS.2017.9
Zhang N, Wang J, He K, Li Z, Huang Y (2019) Mining and clustering service goals for restful service discovery. Knowl Inf Syst 58(3):669–700. https://doi.org/10.1007/s10115-018-1171-4
Article Google Scholar
Cao B, Liu XF, Liu J, Tang M (2015) Effective mashup service clustering method by exploiting lda topic model from multiple data sources. In: Asia-Pacific services computing conference. Springer, pp 165–180. https://doi.org/10.1007/978-3-319-26979-5_12
Kumara BT, Paik I, Koswatte KR, Chen W (2014) Improving web service clustering through post filtering to bootstrap the service discovery. Int J Serv Comput 2(3):1–13. https://doi.org/10.29268/stsc.2014.2.3.1
Article Google Scholar
Agarwal N, Sikka G, Awasthi LK (2022) A systematic literature review on web service clustering approaches to enhance service discovery, selection and recommendation. Comput Sci Rev 45:100498
Article Google Scholar
Elshater Y, Elgazzar K, Martin P (2015) godiscovery: web service discovery made efficient. In: 2015 IEEE international conference on web services. IEEE, pp 711–716. https://doi.org/10.1109/ICWS.2015.99
Lizarralde I, Rodriguez JM, Mateos C, Zunino A (2017) Word embeddings for improving rest services discoverability. In: 2017 XLIII Latin American computer conference (CLEI). IEEE, pp 1–8. https://doi.org/10.1109/CLEI.2017.8226444
Kumara BT, Paik I, Chen W, Ryu KH (2014) Web service clustering using a hybrid term-similarity measure with ontology learning. Int J Web Serv Res (IJWSR) 11(2):24–45
Article Google Scholar
Rupasingha RA, Paik I, Kumara BT (2018) Specificity-aware ontology generation for improving web service clustering. IEICE TRANS Inf Syst 101(8):2035–2043
Article Google Scholar
Nisa R, Qamar U (2015) A text mining based approach for web service classification. Inf Syst e-Bus Manag 13(4):751–768. https://doi.org/10.1007/s10257-014-0252-5
Article Google Scholar
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, pp 50–57 . https://doi.org/10.1145/312624.312649
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
Google Scholar
Blei D, Lafferty J (2006) Correlated topic models. Adv Neural Inf Process Syst 18:147
Google Scholar
Qiang J, Qian Z, Li Y, Yuan Y, Wu X (2020) Short text topic modeling techniques, applications, and performance: a survey. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2020.2992485
Article Google Scholar
Agarwal N, Sikka G, Awasthi LK (2020) Evaluation of web service clustering using dirichlet multinomial mixture model based approach for dimensionality reduction in service representation. Inf Process Manag 57(4):102238. https://doi.org/10.1016/j.ipm.2020.102238
Article Google Scholar
Zhao Y, Wang C, Wang J, He K (2018) Incorporating lda with word embedding for web service clustering. Int J Web Serv Res (IJWSR) 15(4):29–44. https://doi.org/10.4018/IJWSR.2018100102
Article Google Scholar
Bukhari A, Liu X (2018) A web service search engine for large-scale web service discovery based on the probabilistic topic modeling and clustering. Serv Oriented Comput Appl 12(2):169–182. https://doi.org/10.1007/s11761-018-0232-6
Article Google Scholar
Jalal S, Yadav DK, Negi CS (2019) Web service discovery with incorporation of web services clustering. Int J Comput Appl. https://doi.org/10.1080/1206212X.2019.1698131
Article Google Scholar
Zhao Y, He K, Qiao Y (2018) St-lda: high quality similar words augmented lda for service clustering. In: International conference on algorithms and architectures for parallel processing. Springer, pp 46–59. https://doi.org/10.1007/978-3-030-05054-2_4
Agarwal N, Sikka G, Awasthi LK (2020) Enhancing web service clustering using length feature weight method for service description document vector space representation. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2020.113682
Article Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on empirical methods in natural language processing (EMNLP), pp 1532–1543. https://doi.org/10.3115/v1/D14-1162
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146. https://doi.org/10.1162/tacl_a_00051
Article Google Scholar
Bianchi F, Terragni S, Hovy D (2021) Pre-training is a hot topic: contextualized document embeddings improve topic coherence. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, volume 2: Short Papers, pp 759–766
Della Rocca P, Senatore S, Loia V (2017) A semantic-grained perspective of latent knowledge modeling. Inf Fusion 36:52–67. https://doi.org/10.1016/j.inffus.2016.11.003
Article Google Scholar
Tian G, Wang J, Zhao Z, Liu J (2016) Gaussian lda and word embedding for semantic sparse web service discovery. In: International conference on collaborative computing: networking, applications and worksharing. Springer, pp 48–59. https://doi.org/10.1007/978-3-319-59288-6_5
Zeng K, Paik I (2021) Semantic service clustering with lightweight bert-based service embedding using invocation sequences. IEEE Access 9:54298–54309
Article Google Scholar
Zou G, Qin Z, He Q, Wang P, Zhang B, Gan Y (2019) Deepwsc: a novel framework with deep neural network for web service clustering. In: 2019 IEEE international conference on web services (ICWS). IEEE, pp 434–436
Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned from word embeddings. Trans Assoc Comput Linguist 3:211–225. https://doi.org/10.1162/tacl_a_00134
Article Google Scholar
Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2001) Placing search in context: the concept revisited. In: Proceedings of the 10th international conference on world wide web, pp 406–414. https://doi.org/10.1145/503104.503110
Bruni E, Boleda G, Baroni M, Tran N-K (2012) Distributional semantics in technicolor. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long Papers-volume 1. Association for Computational Linguistics, pp 136–145
Hill F, Reichart R, Korhonen A (2015) Simlex-999: evaluating semantic models with (genuine) similarity estimation. Comput Linguist 41(4):665–695. https://doi.org/10.1162/COLI_a_00237
Article MathSciNet Google Scholar
Kliegr T, Zamazal O (2018) Antonyms are similar: towards paradigmatic association approach to rating similarity in simlex-999 and wordsim-353. Data Knowl Eng 115:174–193. https://doi.org/10.1016/j.datak.2018.03.004
Article Google Scholar
Jatnika D, Bijaksana MA, Suryani AA (2019) Word2vec model analysis for semantic similarities in English words. Procedia Comput Sci 157:160–167. https://doi.org/10.1016/j.procs.2019.08.153
Article Google Scholar
Nguyen DQ, Billingsley R, Du L, Johnson M (2015) Improving topic models with latent feature word representations. Trans Assoc Comput Linguist 3:299–313. https://doi.org/10.1162/tacl_a_00140
Article Google Scholar
Pang S, Zou G, Gan Y, Niu S, Zhang B (2019) Augmenting labeled probabilistic topic model for web service classification. Int J Web Serv Res (IJWSR) 16(1):93–113. https://doi.org/10.4018/IJWSR.2019010105
Article Google Scholar
Kotekar S, Kamath SS (2016) Enhancing service discovery using cat swarm optimisation based web service clustering. Perspect Sci 8:715–717. https://doi.org/10.1016/j.pisc.2016.06.068
Article Google Scholar
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp. 1188–1196
Curiskis SA, Drake B, Osborn TR, Kennedy PJ (2019) An evaluation of document clustering and topic modelling in two online social networks: Twitter and reddit. Inf Process Manag. https://doi.org/10.1016/j.ipm.2019.04.002
Article Google Scholar
Xu J, Xu B, Wang P, Zheng S, Tian G, Zhao J (2017) Self-taught convolutional neural networks for short text clustering. Neural Netw 88:22–31. https://doi.org/10.1016/j.neunet.2016.12.008
Article Google Scholar
Yahyaoui H, Own HS (2018) Unsupervised clustering of service performance behaviors. Inf Sci 422:558–571. https://doi.org/10.1016/j.ins.2017.08.065
Article Google Scholar
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci 180(10):2044–2064. https://doi.org/10.1016/j.ins.2009.12.010
Article Google Scholar
Pan W, Chai C (2018) Structure-aware mashup service clustering for cloud-based internet of things using genetic algorithm based clustering algorithm. Future Gener Comput Syst 87:267–277. https://doi.org/10.1016/j.future.2018.04.052
Article Google Scholar
Abualigah LM, Khader AT, Al-Betar MA, Alomari OA (2017) Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst Appl 84:24–36. https://doi.org/10.1016/j.eswa.2017.05.002
Article Google Scholar

Download references

Author information

Neha Agarwal, Geeta Sikka and Lalit Kumar Awasthi have contributed equally to this work.

Authors and Affiliations

Computer Science and Engineering, Indian Institute of Information Technology Raichur, Raichur, Karnataka, 584135, India
Neha Agarwal
Computer Science and Engineering, National Institute of Technology Delhi, Delhi, Delhi, 110036, India
Geeta Sikka
National Institute of Technology Uttarakhand, Srinagar, Uttarakhand, 246174, India
Lalit Kumar Awasthi

Authors

Neha Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
Geeta Sikka
View author publications
You can also search for this author in PubMed Google Scholar
Lalit Kumar Awasthi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed equally to this work.

Corresponding author

Correspondence to Neha Agarwal.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Agarwal, N., Sikka, G. & Awasthi, L.K. Integrating semantic similarity with Dirichlet multinomial mixture model for enhanced web service clustering. Knowl Inf Syst 66, 2327–2353 (2024). https://doi.org/10.1007/s10115-023-02034-x

Download citation

Received: 13 July 2023
Revised: 26 October 2023
Accepted: 22 November 2023
Published: 22 December 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s10115-023-02034-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integrating semantic similarity with Dirichlet multinomial mixture model for enhanced web service clustering

Abstract

Access this article

Similar content being viewed by others

Research on Web Service Clustering Method Based on Word Embedding and Topic Model

Gaussian LDA and Word Embedding for Semantic Sparse Web Service Discovery

PICF-LDA: a topic enhanced LDA with probability incremental correction factor for Web API service clustering

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Integrating semantic similarity with Dirichlet multinomial mixture model for enhanced web service clustering

Abstract

Access this article

Similar content being viewed by others

Research on Web Service Clustering Method Based on Word Embedding and Topic Model

Gaussian LDA and Word Embedding for Semantic Sparse Web Service Discovery

PICF-LDA: a topic enhanced LDA with probability incremental correction factor for Web API service clustering

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation