当前位置: X-MOL 学术Program. Comput. Softw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Twenty Similarity Functions for Two Finite Sequences
Programming and Computer Software ( IF 0.7 ) Pub Date : 2023-10-09 , DOI: 10.1134/s0361768823050031
I. Burdonov , A. Maksimov

Abstract

This paper considers various numerical functions that determine the degree of similarity between two finite sequences. These similarity measures are based on the concept of embedding for sequences, which we define here. A special case of this embedding is a subsequence. Other cases additionally require equal distances between adjacent symbols of a subsequence in both sequences. This is a generalization of the concept of the substring with unit distances. Moreover, equality of distances from the beginning of the sequences to the first embedded symbol or from the last embedded symbol to the end of the sequences may be required. In addition to the last two cases, an embedding can occur in the sequence more than once. In the literature, functions such as the number of common embeddings or the number of pairs of occurrences of embeddings in a sequence are used. We introduce three additional functions: the sum of lengths of common embeddings, the sum of the minimum numbers of occurrences of a common embedding in both sequences, and the similarity function based on the longest common embedding. In total, we consider 20 numerical functions; for 17 of these functions, algorithms (including new ones) of polynomial complexity are proposed; for two functions, algorithms of exponential complexity with a reduced exponent are proposed. In Conclusions, we briefly compare these embeddings and functions.



中文翻译:

两个有限序列的二十个相似函数

摘要

本文考虑了确定两个有限序列之间相似程度的各种数值函数。这些相似性度量基于我们在此定义的序列嵌入的概念。这种嵌入的一个特例是子序列。其他情况还需要两个序列中子序列的相邻符号之间的距离相等。这是单位距离子串概念的推广。此外,可能需要从序列的开头到第一个嵌入符号或从最后一个嵌入符号到序列的末尾的距离相等。除了最后两种情况之外,嵌入还可以在序列中出现多次。在文献中,使用诸如公共嵌入的数量或序列中嵌入出现对的数量等函数。我们引入了三个附加函数:公共嵌入的长度之和、两个序列中公共嵌入的最小出现次数之和以及基于最长公共嵌入的相似度函数。我们总共考虑了 20 个数值函数;对于其中 17 个函数,提出了多项式复杂度的算法(包括新算法);对于两个函数,提出了指数复杂度降低的算法。在结论中,我们简要比较了这些嵌入和函数。以及基于最长公共嵌入的相似度函数。我们总共考虑了 20 个数值函数;对于其中 17 个函数,提出了多项式复杂度的算法(包括新算法);对于两个函数,提出了指数复杂度降低的算法。在结论中,我们简要比较了这些嵌入和函数。以及基于最长公共嵌入的相似度函数。我们总共考虑了 20 个数值函数;对于其中 17 个函数,提出了多项式复杂度的算法(包括新算法);对于两个函数,提出了指数复杂度降低的算法。在结论中,我们简要比较了这些嵌入和函数。

更新日期:2023-10-10
down
wechat
bug