Abstract
With accelerated advancement of web 2.0, developers generally describe the functionality of services in short natural text. Keyword-based searching techniques are not an efficient way of discovering services from repositories. It suffers from vocabulary problems. Latent Dirichlet allocation (LDA) with word embedding techniques is widely adopted for efficiently extracting latent features from the service descriptions. However, LDA is not efficient on short text due to limited content and inadequate occurring words. The word vectors generated by word embedding techniques are of finer quality than topic modeling techniques. Gibbs sampling algorithm for Dirichlet multinomial mixture (GSDMM) model gives better results on web service description documents because it provides one topic corresponding to one document. In this paper, we evaluate the performance of GSDMM model with word embeddings and propose WV+GSDMMK model. The proposed model improves service-to-topic mapping by determining semantic similarity among features. K-means clustering is applied on service to topic representation. Results are evaluated on five real-time datasets based on intrinsic and extrinsic evaluation measures. Experimental results demonstrate that the proposed method outperforms other baseline techniques, and the accuracy score is also increased by 5%, 18%, 3%, 4%, and 6% on datasets DS1, DS2, DS3, DS4, and DS5, respectively.
Similar content being viewed by others
References
Obidallah WJ, Raahemi B, Ruhi U (2020) Clustering and association rules for web service discovery and recommendation: a systematic literature review. SN Comput Sci 1(1):27. https://doi.org/10.1007/s42979-019-0026-8
Shi M, Liu J, Zhou D, Tang M, Cao B (2017) We-lda: a word embeddings augmented lda model for web services clustering. In: 2017 IEEE international conference on web services (icws). IEEE, pp 9–16. https://doi.org/10.1109/ICWS.2017.9
Zhang N, Wang J, He K, Li Z, Huang Y (2019) Mining and clustering service goals for restful service discovery. Knowl Inf Syst 58(3):669–700. https://doi.org/10.1007/s10115-018-1171-4
Cao B, Liu XF, Liu J, Tang M (2015) Effective mashup service clustering method by exploiting lda topic model from multiple data sources. In: Asia-Pacific services computing conference. Springer, pp 165–180. https://doi.org/10.1007/978-3-319-26979-5_12
Kumara BT, Paik I, Koswatte KR, Chen W (2014) Improving web service clustering through post filtering to bootstrap the service discovery. Int J Serv Comput 2(3):1–13. https://doi.org/10.29268/stsc.2014.2.3.1
Agarwal N, Sikka G, Awasthi LK (2022) A systematic literature review on web service clustering approaches to enhance service discovery, selection and recommendation. Comput Sci Rev 45:100498
Elshater Y, Elgazzar K, Martin P (2015) godiscovery: web service discovery made efficient. In: 2015 IEEE international conference on web services. IEEE, pp 711–716. https://doi.org/10.1109/ICWS.2015.99
Lizarralde I, Rodriguez JM, Mateos C, Zunino A (2017) Word embeddings for improving rest services discoverability. In: 2017 XLIII Latin American computer conference (CLEI). IEEE, pp 1–8. https://doi.org/10.1109/CLEI.2017.8226444
Kumara BT, Paik I, Chen W, Ryu KH (2014) Web service clustering using a hybrid term-similarity measure with ontology learning. Int J Web Serv Res (IJWSR) 11(2):24–45
Rupasingha RA, Paik I, Kumara BT (2018) Specificity-aware ontology generation for improving web service clustering. IEICE TRANS Inf Syst 101(8):2035–2043
Nisa R, Qamar U (2015) A text mining based approach for web service classification. Inf Syst e-Bus Manag 13(4):751–768. https://doi.org/10.1007/s10257-014-0252-5
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, pp 50–57 . https://doi.org/10.1145/312624.312649
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
Blei D, Lafferty J (2006) Correlated topic models. Adv Neural Inf Process Syst 18:147
Qiang J, Qian Z, Li Y, Yuan Y, Wu X (2020) Short text topic modeling techniques, applications, and performance: a survey. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2020.2992485
Agarwal N, Sikka G, Awasthi LK (2020) Evaluation of web service clustering using dirichlet multinomial mixture model based approach for dimensionality reduction in service representation. Inf Process Manag 57(4):102238. https://doi.org/10.1016/j.ipm.2020.102238
Zhao Y, Wang C, Wang J, He K (2018) Incorporating lda with word embedding for web service clustering. Int J Web Serv Res (IJWSR) 15(4):29–44. https://doi.org/10.4018/IJWSR.2018100102
Bukhari A, Liu X (2018) A web service search engine for large-scale web service discovery based on the probabilistic topic modeling and clustering. Serv Oriented Comput Appl 12(2):169–182. https://doi.org/10.1007/s11761-018-0232-6
Jalal S, Yadav DK, Negi CS (2019) Web service discovery with incorporation of web services clustering. Int J Comput Appl. https://doi.org/10.1080/1206212X.2019.1698131
Zhao Y, He K, Qiao Y (2018) St-lda: high quality similar words augmented lda for service clustering. In: International conference on algorithms and architectures for parallel processing. Springer, pp 46–59. https://doi.org/10.1007/978-3-030-05054-2_4
Agarwal N, Sikka G, Awasthi LK (2020) Enhancing web service clustering using length feature weight method for service description document vector space representation. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2020.113682
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on empirical methods in natural language processing (EMNLP), pp 1532–1543. https://doi.org/10.3115/v1/D14-1162
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146. https://doi.org/10.1162/tacl_a_00051
Bianchi F, Terragni S, Hovy D (2021) Pre-training is a hot topic: contextualized document embeddings improve topic coherence. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, volume 2: Short Papers, pp 759–766
Della Rocca P, Senatore S, Loia V (2017) A semantic-grained perspective of latent knowledge modeling. Inf Fusion 36:52–67. https://doi.org/10.1016/j.inffus.2016.11.003
Tian G, Wang J, Zhao Z, Liu J (2016) Gaussian lda and word embedding for semantic sparse web service discovery. In: International conference on collaborative computing: networking, applications and worksharing. Springer, pp 48–59. https://doi.org/10.1007/978-3-319-59288-6_5
Zeng K, Paik I (2021) Semantic service clustering with lightweight bert-based service embedding using invocation sequences. IEEE Access 9:54298–54309
Zou G, Qin Z, He Q, Wang P, Zhang B, Gan Y (2019) Deepwsc: a novel framework with deep neural network for web service clustering. In: 2019 IEEE international conference on web services (ICWS). IEEE, pp 434–436
Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned from word embeddings. Trans Assoc Comput Linguist 3:211–225. https://doi.org/10.1162/tacl_a_00134
Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2001) Placing search in context: the concept revisited. In: Proceedings of the 10th international conference on world wide web, pp 406–414. https://doi.org/10.1145/503104.503110
Bruni E, Boleda G, Baroni M, Tran N-K (2012) Distributional semantics in technicolor. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long Papers-volume 1. Association for Computational Linguistics, pp 136–145
Hill F, Reichart R, Korhonen A (2015) Simlex-999: evaluating semantic models with (genuine) similarity estimation. Comput Linguist 41(4):665–695. https://doi.org/10.1162/COLI_a_00237
Kliegr T, Zamazal O (2018) Antonyms are similar: towards paradigmatic association approach to rating similarity in simlex-999 and wordsim-353. Data Knowl Eng 115:174–193. https://doi.org/10.1016/j.datak.2018.03.004
Jatnika D, Bijaksana MA, Suryani AA (2019) Word2vec model analysis for semantic similarities in English words. Procedia Comput Sci 157:160–167. https://doi.org/10.1016/j.procs.2019.08.153
Nguyen DQ, Billingsley R, Du L, Johnson M (2015) Improving topic models with latent feature word representations. Trans Assoc Comput Linguist 3:299–313. https://doi.org/10.1162/tacl_a_00140
Pang S, Zou G, Gan Y, Niu S, Zhang B (2019) Augmenting labeled probabilistic topic model for web service classification. Int J Web Serv Res (IJWSR) 16(1):93–113. https://doi.org/10.4018/IJWSR.2019010105
Kotekar S, Kamath SS (2016) Enhancing service discovery using cat swarm optimisation based web service clustering. Perspect Sci 8:715–717. https://doi.org/10.1016/j.pisc.2016.06.068
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp. 1188–1196
Curiskis SA, Drake B, Osborn TR, Kennedy PJ (2019) An evaluation of document clustering and topic modelling in two online social networks: Twitter and reddit. Inf Process Manag. https://doi.org/10.1016/j.ipm.2019.04.002
Xu J, Xu B, Wang P, Zheng S, Tian G, Zhao J (2017) Self-taught convolutional neural networks for short text clustering. Neural Netw 88:22–31. https://doi.org/10.1016/j.neunet.2016.12.008
Yahyaoui H, Own HS (2018) Unsupervised clustering of service performance behaviors. Inf Sci 422:558–571. https://doi.org/10.1016/j.ins.2017.08.065
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci 180(10):2044–2064. https://doi.org/10.1016/j.ins.2009.12.010
Pan W, Chai C (2018) Structure-aware mashup service clustering for cloud-based internet of things using genetic algorithm based clustering algorithm. Future Gener Comput Syst 87:267–277. https://doi.org/10.1016/j.future.2018.04.052
Abualigah LM, Khader AT, Al-Betar MA, Alomari OA (2017) Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst Appl 84:24–36. https://doi.org/10.1016/j.eswa.2017.05.002
Author information
Authors and Affiliations
Contributions
All authors contributed equally to this work.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Agarwal, N., Sikka, G. & Awasthi, L.K. Integrating semantic similarity with Dirichlet multinomial mixture model for enhanced web service clustering. Knowl Inf Syst 66, 2327–2353 (2024). https://doi.org/10.1007/s10115-023-02034-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-02034-x