当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Integrating semantic similarity with Dirichlet multinomial mixture model for enhanced web service clustering
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2023-12-22 , DOI: 10.1007/s10115-023-02034-x
Neha Agarwal , Geeta Sikka , Lalit Kumar Awasthi

Abstract

With accelerated advancement of web 2.0, developers generally describe the functionality of services in short natural text. Keyword-based searching techniques are not an efficient way of discovering services from repositories. It suffers from vocabulary problems. Latent Dirichlet allocation (LDA) with word embedding techniques is widely adopted for efficiently extracting latent features from the service descriptions. However, LDA is not efficient on short text due to limited content and inadequate occurring words. The word vectors generated by word embedding techniques are of finer quality than topic modeling techniques. Gibbs sampling algorithm for Dirichlet multinomial mixture (GSDMM) model gives better results on web service description documents because it provides one topic corresponding to one document. In this paper, we evaluate the performance of GSDMM model with word embeddings and propose WV+GSDMMK model. The proposed model improves service-to-topic mapping by determining semantic similarity among features. K-means clustering is applied on service to topic representation. Results are evaluated on five real-time datasets based on intrinsic and extrinsic evaluation measures. Experimental results demonstrate that the proposed method outperforms other baseline techniques, and the accuracy score is also increased by 5%, 18%, 3%, 4%, and 6% on datasets DS1, DS2, DS3, DS4, and DS5, respectively.



中文翻译:

将语义相似度与 Dirichlet 多项式混合模型相结合以增强 Web 服务聚类

摘要

随着Web 2.0的加速发展,开发人员通常使用简短的自然文本来描述服务的功能。基于关键字的搜索技术并不是从存储库中发现服务的有效方法。它存在词汇问题。带有词嵌入技术的潜在狄利克雷分配(LDA)被广泛采用,用于有效地从服务描述中提取潜在特征。然而,由于内容有限和出现的单词不足,LDA 在短文本上效率不高。词嵌入技术生成的词向量比主题建模技术具有更好的质量。Dirichlet 多项式混合 (GSDMM) 模型的吉布斯采样算法在 Web 服务描述文档上提供了更好的结果,因为它提供了一个文档对应的一个主题。在本文中,我们评估了带有词嵌入的 GSDMM 模型的性能,并提出了 WV+GSDMMK 模型。所提出的模型通过确定特征之间的语义相似性来改进服务到主题的映射。K-means 聚类应用于服务到主题表示。根据内在和外在评估措施在五个实时数据集上评估结果。实验结果表明,该方法优于其他基线技术,并且在数据集 DS1、DS2、DS3、DS4 和 DS5 上的准确度分数也分别提高了 5%、18%、3%、4% 和 6%。

更新日期:2023-12-23
down
wechat
bug