当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SEHP: stacking-based ensemble learning on novel features for review helpfulness prediction
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2023-11-27 , DOI: 10.1007/s10115-023-02020-3
Muhammad Shahid Iqbal Malik , Aftab Nawaz

The review’s helpfulness and its impact on purchase decisions are well established. This study presents a robust helpfulness prediction model for customer reviews. To this end, significant review textual features and newly defined reviewer characteristics are explored with a stacking-based ensemble model. More specifically, stylistic, time complexity, summary language, psychological, and linguistics features are introduced. According to our knowledge, these features are not explored earlier with the stacking-based ensemble model for review helpfulness prediction. The proposed predictive model is evaluated on three benchmark Amazon review datasets, consisting of 200,979 reviews in total. Two algorithms are proposed to help readers for understanding the methodology and researchers to regenerate the results. We compared several machine-learning, stacking-based ensemble, and 1-dimenional convolutional neural network (1D CNN) models. The stacking-based ensemble model shows benchmark performance by obtaining 0.009 mean square error with a hybrid combination of the proposed (reviewer and textual) features. Moreover, the proposed model outperformed five baselines including the fine-tuned BERT (Bidirectional Encoder Representations from Transformers) model by reducing mean square error by 40%. The results show that review textual features are better predictors than reviewer features as a standalone model. The findings of this article have significant implications for the researchers and the business owners.



中文翻译:

SEHP:基于堆叠的新特征集成学习,用于评论有用性预测

评论的有用性及其对购买决策的影响是众所周知的。这项研究为客户评论提供了一个强大的有用性预测模型。为此,我们使用基于堆栈的集成模型来探索重要的评论文本特征和新定义的评论者特征。更具体地说,介绍了文体、时间复杂性、摘要语言、心理学和语言学特征。据我们所知,用于评论有用性预测的基于堆叠的集成模型并未早期探索这些特征。所提出的预测模型在三个基准亚马逊评论数据集上进行评估,总共包含 200,979 条评论。提出了两种算法来帮助读者理解方法论和研究人员重新生成结果。我们比较了几种机器学习、基于堆栈的集成和一维卷积神经网络 (1D CNN) 模型。基于堆叠的集成模型通过使用所提出的(审阅者和文本)特征的混合组合获得 0.009 均方误差来显示基准性能。此外,该模型的均方误差降低了 40%,优于包括微调 BERT(来自 Transformers 的双向编码器表示)模型在内的 5 个基线。结果表明,评论文本特征作为独立模型比评论者特征具有更好的预测能力。本文的研究结果对研究人员和企业主具有重大意义。

更新日期:2023-11-27
down
wechat
bug