当前位置: X-MOL 学术Cognit. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Fistful of Vectors: A Tool for Intrinsic Evaluation of Word Embeddings
Cognitive Computation ( IF 5.4 ) Pub Date : 2024-01-22 , DOI: 10.1007/s12559-023-10235-3
Roberto Ascari , Anna Giabelli , Lorenzo Malandri , Fabio Mercorio , Mario Mezzanzanica

The utilization of word embeddings—powerful models computed through Neural Network architectures that encode words as vectors—has witnessed rapid growth across various Natural Language Processing applications, encompassing semantic analysis, information retrieval, dependency parsing, question answering, and machine translation. The efficacy of these tasks is strictly linked to the quality of the embeddings, underscoring the critical importance of evaluating and selecting optimal embedding models. While established procedures and benchmarks exist for intrinsic evaluation, the authors note a conspicuous absence of comprehensive evaluations of intrinsic embedding quality across multiple tasks. This paper introduces vec2best, a unified tool encompassing state-of-the-art intrinsic evaluation tasks across diverse benchmarks. vec2best furnishes the user with an extensive evaluation of word embedding models. It represents a framework for evaluating word embeddings trained using various methods and hyper-parameters on a range of tasks from the literature. The tool yields a holistic evaluation metric for each model called the PCE (Principal Component Evaluation). We conducted evaluations on 135 word embedding models, trained using GloVe, fastText, and word2vec, across four tasks integrated into vec2best (similarity, analogy, categorization, and outlier detection), along with their respective benchmarks. Additionally, we leveraged vec2best to optimize embedding hyper-parameter configurations in a real-world scenario. vec2best is conveniently accessible as a pip-installable Python package.



中文翻译:

一把向量:词嵌入内在评估的工具

词嵌入(通过神经网络架构计算出的强大模型,将词编码为向量)的使用在各种自然语言处理应用中得到了快速增长,包括语义分析、信息检索、依存解析、问答和机器翻译。这些任务的有效性与嵌入的质量严格相关,强调了评估和选择最佳嵌入模型的至关重要性。尽管存在用于内在评估的既定程序和基准,但作者指出,明显缺乏跨多个任务的内在嵌入质量的综合评估。本文介绍了vec2best,这是一个统一的工具,包含跨不同基准的最先进的内在评估任务。vec2best为用户提供了对词嵌入模型的广泛评估。它代表了一个框架,用于评估使用各种方法和超参数对文献中的一系列任务进行训练的词嵌入。该工具为每个模型生成一个整体评估指标,称为PCE主成分评估)。我们对 135 个词嵌入模型进行了评估,这些模型使用 GloVe、fastText 和 word2vec 进行训练,涉及集成到vec2best中的四个任务(相似性、类比、分类和异常值检测)及其各自的基准。此外,我们利用 vec2best 优化现实场景中的嵌入超参数配置。vec2best可以作为可通过 pip 安装的 Python 包方便地访问。

更新日期:2024-01-22
down
wechat
bug