当前位置: X-MOL 学术Int. J. Uncertain. Fuzziness Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enhanced CRNN-Based Optimal Web Page Classification and Improved Tunicate Swarm Algorithm-Based Re-Ranking
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems ( IF 1.5 ) Pub Date : 2022-11-18 , DOI: 10.1142/s0218488522500246
Syed Ahmed Yasin 1 , P. V. R. D. Prasada Rao 1
Affiliation  

The main intention of this paper is to develop a new intelligent framework for web page classification and re-ranking. The two main phases of the proposed model are (a) classification, and (b) re-ranking-based retrieval. In the classification phase, pre-processing is initially performed, which follows the steps like HTML (Hyper Text Markup Language) tag removal, punctuation marks removal, stop words removal, and stemming. After pre-processing, word to vector formation is done and then, feature extraction is performed by Principle Component Analysis (PCA). From this, optimal feature selection is accomplished, which is the important process for the accurate classification of web pages. Web pages contain several features, which reduces the classification accuracy. Here, the adoption of a new meta-heuristic algorithm termed Opposition based-Tunicate Swarm Algorithm (O-TSA) is employed to perform the optimal feature selection. Finally, the selected features are subjected to the Enhanced Convolutional-Recurrent Neural Network (E-CRNN) for accurate web page classification with enhancement based on O-TSA. The outcome of this phase is the categorization of different web page classes. In the second phase, the re-ranking is involved utilizing the O-TSA, which derives the objective function based on similarity function (correlation) for URL matching, which results in optimal re-ranking of web pages for retrieval. Thus, the proposed method yields better classification and re-ranking performance and reduce space requirements and search time in the web documents compared with the existing methods.



中文翻译:

增强的基于 CRNN 的最优网页分类和改进的基于 Tunicate Swarm 算法的重新排序

本文的主要目的是开发一种新的网页分类和重新排名智能框架。所提出模型的两个主要阶段是 (a) 分类,和 (b) 基于重新排序的检索。在分类阶段,首先进行预处理,遵循HTML(超文本标记语言)标签去除、标点符号去除、停用词去除和词干提取等步骤。预处理后,完成词到向量的形成,然后通过主成分分析(PCA)进行特征提取。由此完成最优特征选择,这是网页准确分类的重要过程。网页包含多个特征,这会降低分类的准确性。这里,采用一种新的元启发式算法,称为基于反对的 Tunicate Swarm 算法 (O-TSA),用于执行最佳特征选择。最后,将所选特征置于增强型卷积递归神经网络 (E-CRNN) 中,以基于 O-TSA 增强进行准确的网页分类。此阶段的结果是对不同网页类别进行分类。在第二阶段,重新排序涉及利用 O-TSA,它基于 URL 匹配的相似性函数(相关性)导出目标函数,从而对检索网页进行最佳重新排序。因此,与现有方法相比,所提出的方法可产生更好的分类和重新排序性能,并减少网络文档中的空间需求和搜索时间。

更新日期:2022-11-21
down
wechat
bug