当前位置: X-MOL 学术IEEE Open J. Comput. Soc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cardinality Estimation in Inner Product Space
IEEE Open Journal of the Computer Society Pub Date : 2022-10-17 , DOI: 10.1109/ojcs.2022.3215206
Kohei Hirata 1 , Daichi Amagata 1 , Takahiro Hara 1
Affiliation  

This article addresses the problem of cardinality estimation in inner product spaces. Given a set of high-dimensional vectors, a query, and a threshold, this problem estimates the number of vectors such that their inner products with the query are not less than the threshold. This is an important problem for recent machine-learning applications that maintain objects, such as users and items, by using matrices. The important requirements for solutions of this problem are high efficiency and accuracy. To satisfy these requirements, we propose a sampling-based algorithm. We build trees of vectors via transformation to a Euclidean space and dimensionality reduction in a pre-processing phase. Then our algorithm samples vectors existing in the nodes that intersect with a search range on one of the trees. Our algorithm is surprisingly simple, but it is theoretically and practically fast and effective. We conduct extensive experiments on real datasets, and the results demonstrate that our algorithm shows superior performance compared with existing techniques.

中文翻译:

内积空间中的基数估计

本文解决了内积空间中的基数估计问题。给定一组高维向量、一个查询和一个阈值,这个问题估计向量的数量,使得它们与查询的内积不小于阈值。对于最近使用矩阵来维护对象(例如用户和项目)的机器学习应用程序来说,这是一个重要问题。解决这个问题的重要要求是高效和准确。为了满足这些要求,我们提出了一种基于采样的算法。我们通过转换到欧几里得空间和预处理阶段的降维来构建向量树。然后我们的算法对存在于与其中一棵树上的搜索范围相交的节点中的向量进行采样。我们的算法非常简单,但它在理论上和实践上都是快速有效的。我们对真实数据集进行了广泛的实验,结果表明我们的算法与现有技术相比表现出优越的性能。
更新日期:2022-10-17
down
wechat
bug