Skip to main content
Log in

Integrating semantic similarity with Dirichlet multinomial mixture model for enhanced web service clustering

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

With accelerated advancement of web 2.0, developers generally describe the functionality of services in short natural text. Keyword-based searching techniques are not an efficient way of discovering services from repositories. It suffers from vocabulary problems. Latent Dirichlet allocation (LDA) with word embedding techniques is widely adopted for efficiently extracting latent features from the service descriptions. However, LDA is not efficient on short text due to limited content and inadequate occurring words. The word vectors generated by word embedding techniques are of finer quality than topic modeling techniques. Gibbs sampling algorithm for Dirichlet multinomial mixture (GSDMM) model gives better results on web service description documents because it provides one topic corresponding to one document. In this paper, we evaluate the performance of GSDMM model with word embeddings and propose WV+GSDMMK model. The proposed model improves service-to-topic mapping by determining semantic similarity among features. K-means clustering is applied on service to topic representation. Results are evaluated on five real-time datasets based on intrinsic and extrinsic evaluation measures. Experimental results demonstrate that the proposed method outperforms other baseline techniques, and the accuracy score is also increased by 5%, 18%, 3%, 4%, and 6% on datasets DS1, DS2, DS3, DS4, and DS5, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://www.programmableweb.com/.

  2. https://python-graph-gallery.com/wordcloud/.

  3. https://leviants.com/multilingual-simlex999-and-wordsim353/.

  4. https://leviants.com/multilingual-simlex999-and-wordsim353/.

  5. https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html.

References

  1. Obidallah WJ, Raahemi B, Ruhi U (2020) Clustering and association rules for web service discovery and recommendation: a systematic literature review. SN Comput Sci 1(1):27. https://doi.org/10.1007/s42979-019-0026-8

    Article  Google Scholar 

  2. Shi M, Liu J, Zhou D, Tang M, Cao B (2017) We-lda: a word embeddings augmented lda model for web services clustering. In: 2017 IEEE international conference on web services (icws). IEEE, pp 9–16. https://doi.org/10.1109/ICWS.2017.9

  3. Zhang N, Wang J, He K, Li Z, Huang Y (2019) Mining and clustering service goals for restful service discovery. Knowl Inf Syst 58(3):669–700. https://doi.org/10.1007/s10115-018-1171-4

    Article  Google Scholar 

  4. Cao B, Liu XF, Liu J, Tang M (2015) Effective mashup service clustering method by exploiting lda topic model from multiple data sources. In: Asia-Pacific services computing conference. Springer, pp 165–180. https://doi.org/10.1007/978-3-319-26979-5_12

  5. Kumara BT, Paik I, Koswatte KR, Chen W (2014) Improving web service clustering through post filtering to bootstrap the service discovery. Int J Serv Comput 2(3):1–13. https://doi.org/10.29268/stsc.2014.2.3.1

    Article  Google Scholar 

  6. Agarwal N, Sikka G, Awasthi LK (2022) A systematic literature review on web service clustering approaches to enhance service discovery, selection and recommendation. Comput Sci Rev 45:100498

    Article  Google Scholar 

  7. Elshater Y, Elgazzar K, Martin P (2015) godiscovery: web service discovery made efficient. In: 2015 IEEE international conference on web services. IEEE, pp 711–716. https://doi.org/10.1109/ICWS.2015.99

  8. Lizarralde I, Rodriguez JM, Mateos C, Zunino A (2017) Word embeddings for improving rest services discoverability. In: 2017 XLIII Latin American computer conference (CLEI). IEEE, pp 1–8. https://doi.org/10.1109/CLEI.2017.8226444

  9. Kumara BT, Paik I, Chen W, Ryu KH (2014) Web service clustering using a hybrid term-similarity measure with ontology learning. Int J Web Serv Res (IJWSR) 11(2):24–45

    Article  Google Scholar 

  10. Rupasingha RA, Paik I, Kumara BT (2018) Specificity-aware ontology generation for improving web service clustering. IEICE TRANS Inf Syst 101(8):2035–2043

    Article  Google Scholar 

  11. Nisa R, Qamar U (2015) A text mining based approach for web service classification. Inf Syst e-Bus Manag 13(4):751–768. https://doi.org/10.1007/s10257-014-0252-5

    Article  Google Scholar 

  12. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, pp 50–57 . https://doi.org/10.1145/312624.312649

  13. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022

    Google Scholar 

  14. Blei D, Lafferty J (2006) Correlated topic models. Adv Neural Inf Process Syst 18:147

    Google Scholar 

  15. Qiang J, Qian Z, Li Y, Yuan Y, Wu X (2020) Short text topic modeling techniques, applications, and performance: a survey. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2020.2992485

    Article  Google Scholar 

  16. Agarwal N, Sikka G, Awasthi LK (2020) Evaluation of web service clustering using dirichlet multinomial mixture model based approach for dimensionality reduction in service representation. Inf Process Manag 57(4):102238. https://doi.org/10.1016/j.ipm.2020.102238

    Article  Google Scholar 

  17. Zhao Y, Wang C, Wang J, He K (2018) Incorporating lda with word embedding for web service clustering. Int J Web Serv Res (IJWSR) 15(4):29–44. https://doi.org/10.4018/IJWSR.2018100102

    Article  Google Scholar 

  18. Bukhari A, Liu X (2018) A web service search engine for large-scale web service discovery based on the probabilistic topic modeling and clustering. Serv Oriented Comput Appl 12(2):169–182. https://doi.org/10.1007/s11761-018-0232-6

    Article  Google Scholar 

  19. Jalal S, Yadav DK, Negi CS (2019) Web service discovery with incorporation of web services clustering. Int J Comput Appl. https://doi.org/10.1080/1206212X.2019.1698131

    Article  Google Scholar 

  20. Zhao Y, He K, Qiao Y (2018) St-lda: high quality similar words augmented lda for service clustering. In: International conference on algorithms and architectures for parallel processing. Springer, pp 46–59. https://doi.org/10.1007/978-3-030-05054-2_4

  21. Agarwal N, Sikka G, Awasthi LK (2020) Enhancing web service clustering using length feature weight method for service description document vector space representation. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2020.113682

    Article  Google Scholar 

  22. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  23. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on empirical methods in natural language processing (EMNLP), pp 1532–1543. https://doi.org/10.3115/v1/D14-1162

  24. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146. https://doi.org/10.1162/tacl_a_00051

    Article  Google Scholar 

  25. Bianchi F, Terragni S, Hovy D (2021) Pre-training is a hot topic: contextualized document embeddings improve topic coherence. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, volume 2: Short Papers, pp 759–766

  26. Della Rocca P, Senatore S, Loia V (2017) A semantic-grained perspective of latent knowledge modeling. Inf Fusion 36:52–67. https://doi.org/10.1016/j.inffus.2016.11.003

    Article  Google Scholar 

  27. Tian G, Wang J, Zhao Z, Liu J (2016) Gaussian lda and word embedding for semantic sparse web service discovery. In: International conference on collaborative computing: networking, applications and worksharing. Springer, pp 48–59. https://doi.org/10.1007/978-3-319-59288-6_5

  28. Zeng K, Paik I (2021) Semantic service clustering with lightweight bert-based service embedding using invocation sequences. IEEE Access 9:54298–54309

    Article  Google Scholar 

  29. Zou G, Qin Z, He Q, Wang P, Zhang B, Gan Y (2019) Deepwsc: a novel framework with deep neural network for web service clustering. In: 2019 IEEE international conference on web services (ICWS). IEEE, pp 434–436

  30. Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned from word embeddings. Trans Assoc Comput Linguist 3:211–225. https://doi.org/10.1162/tacl_a_00134

    Article  Google Scholar 

  31. Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2001) Placing search in context: the concept revisited. In: Proceedings of the 10th international conference on world wide web, pp 406–414. https://doi.org/10.1145/503104.503110

  32. Bruni E, Boleda G, Baroni M, Tran N-K (2012) Distributional semantics in technicolor. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long Papers-volume 1. Association for Computational Linguistics, pp 136–145

  33. Hill F, Reichart R, Korhonen A (2015) Simlex-999: evaluating semantic models with (genuine) similarity estimation. Comput Linguist 41(4):665–695. https://doi.org/10.1162/COLI_a_00237

    Article  MathSciNet  Google Scholar 

  34. Kliegr T, Zamazal O (2018) Antonyms are similar: towards paradigmatic association approach to rating similarity in simlex-999 and wordsim-353. Data Knowl Eng 115:174–193. https://doi.org/10.1016/j.datak.2018.03.004

    Article  Google Scholar 

  35. Jatnika D, Bijaksana MA, Suryani AA (2019) Word2vec model analysis for semantic similarities in English words. Procedia Comput Sci 157:160–167. https://doi.org/10.1016/j.procs.2019.08.153

    Article  Google Scholar 

  36. Nguyen DQ, Billingsley R, Du L, Johnson M (2015) Improving topic models with latent feature word representations. Trans Assoc Comput Linguist 3:299–313. https://doi.org/10.1162/tacl_a_00140

    Article  Google Scholar 

  37. Pang S, Zou G, Gan Y, Niu S, Zhang B (2019) Augmenting labeled probabilistic topic model for web service classification. Int J Web Serv Res (IJWSR) 16(1):93–113. https://doi.org/10.4018/IJWSR.2019010105

    Article  Google Scholar 

  38. Kotekar S, Kamath SS (2016) Enhancing service discovery using cat swarm optimisation based web service clustering. Perspect Sci 8:715–717. https://doi.org/10.1016/j.pisc.2016.06.068

    Article  Google Scholar 

  39. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp. 1188–1196

  40. Curiskis SA, Drake B, Osborn TR, Kennedy PJ (2019) An evaluation of document clustering and topic modelling in two online social networks: Twitter and reddit. Inf Process Manag. https://doi.org/10.1016/j.ipm.2019.04.002

    Article  Google Scholar 

  41. Xu J, Xu B, Wang P, Zheng S, Tian G, Zhao J (2017) Self-taught convolutional neural networks for short text clustering. Neural Netw 88:22–31. https://doi.org/10.1016/j.neunet.2016.12.008

    Article  Google Scholar 

  42. Yahyaoui H, Own HS (2018) Unsupervised clustering of service performance behaviors. Inf Sci 422:558–571. https://doi.org/10.1016/j.ins.2017.08.065

    Article  Google Scholar 

  43. García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci 180(10):2044–2064. https://doi.org/10.1016/j.ins.2009.12.010

    Article  Google Scholar 

  44. Pan W, Chai C (2018) Structure-aware mashup service clustering for cloud-based internet of things using genetic algorithm based clustering algorithm. Future Gener Comput Syst 87:267–277. https://doi.org/10.1016/j.future.2018.04.052

    Article  Google Scholar 

  45. Abualigah LM, Khader AT, Al-Betar MA, Alomari OA (2017) Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst Appl 84:24–36. https://doi.org/10.1016/j.eswa.2017.05.002

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed equally to this work.

Corresponding author

Correspondence to Neha Agarwal.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Agarwal, N., Sikka, G. & Awasthi, L.K. Integrating semantic similarity with Dirichlet multinomial mixture model for enhanced web service clustering. Knowl Inf Syst 66, 2327–2353 (2024). https://doi.org/10.1007/s10115-023-02034-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-02034-x

Keywords

Navigation