Abstract

Many researchers have used tag information to improve the performance of recommendation techniques in recommender systems. Examining the tags of users will help to get their interests and leads to more accuracy in the recommendations. Since user-defined tags are chosen freely and without any restrictions, problems arise in determining their exact meaning and the similarity of tags. However, using thesaurus and ontologies to find the meaning of tags is not very efficient due to their free definition by users and the use of different languages in many data sets. Therefore, this article uses mathematical and statistical methods to determine lexical similarity and co-occurrence tags solution to assign semantic similarity. On the other hand, due to the change of users’ interests over time this article has considered the time of tag assignments in co-occurrence tags for determining the similarity of tags. Then the graph is created based on similarity of tags. For modeling the interests of the users, the communities of tags are determined by using community detection methods. So, recommendations based on the communities of tags and similarity between resources are done. The performance of the proposed method has been evaluated using two criteria of precision and recall through evaluations on two public datasets. The evaluation results show that the precision and recall of the proposed method have significantly improved, compared to the other methods. According to the experimental results, the criteria of recall and precision have been improved, on average by 5% and 7%, respectively.

1. Introduction

People face a rapid and huge growth of data in social systems. Although there is a lot of useful information in various fields, finding accurate and desirable data is difficult and time-consuming. To conquer this problem, recommender systems have been provided. Those systems are software techniques and tools that assist users in various decision-making processes. In fact, while users need to find the right information, they need a system that supports them. One of the offered solutions in this field is the development of recommender systems to provide personalized services according to users’ interests. The recommender systems are used in various fields and applications. One of the most popular well-known systems implemented is Amazon website, which takes advantage of customers’ purchase behavior, attractions, and offers according to the users’ interests.

The overall structure of a recommender system follows a set of phases including collection, learning, and recommendation [1, 2]. In the first phase, appropriate resources that comprise the relevant information of users are selected. Then, a leaner (supervised or unsupervised learning) analyzes the users’ preferences and extracts their behavioral patterns. The final phase recommends the entities that are the most similar to the users’ interests. It is important to recognize that, within a common core structure of a recommender system, there are variations from application to application. Some of the most sophisticated and heavily used recommender systems in the industry are Last.Fm, YouTube, and Amazon [3].

Generally, recommender systems can generate a list of recommendations by these approaches: content-based filtering, collaborative filtering, hybrid recommender systems and, so on [4]. Based on the existing research, the conventional CF (collaborative filtering) approaches, which only use user-item rating information to make recommendations, are in two major categories: the memory-based CF and model-based CF both of which can be used to make recommendations in tagging systems [5]. Memory-based methods make suggestions based on the nearest neighbors and model-based recommendation based on the model created by users. In recommender systems, another type of system was introduced as tagged or tagging systems. Mathes [6] discussed the tags on the web in articles in late 2004.

Recently, social tagging systems have become an important instrument of Web 2.0 that allows users to collaboratively annotate and search the content [7, 8]. To facilitate this process, the present research has attempted to improve the performance and quality of resource recommendations. Despite the creation of new opportunities, social tagging recommender systems, revive old problems such as information overload. Recommender systems good applications in making available the information that is related to the users’ interests. However, we face new challenges in tagging recommender systems.

In these systems, users are interested in finding tags, contents, and even other users. Furthermore, while traditional recommender systems typically work with 2D data arrays, the data in these systems act as a third-order tensor or a multilayer graph with user nodes, resources, and tags which have been introduced as new aspects of recommendations such as users, resources, and introduced the tags. Therefore, new approaches and algorithms were needed to address the threefold nature of the data in these systems. Various social tagging systems such as Del.icio.us, Last.Fm, CiteULike, Flickr, and others allow users to assign custom tags for resources based on their background knowledge to manage, organize, share, discover, and retrieve resources [9]. These systems aggregate the information of heterogeneous elements to have enriched information. The role of tags in the systems is essential [10].

Collaborative tagging systems, also known as folksonomy, have grown dramatically on the Web. Tags in these systems significantly organize the content of websites and other resources and effectively display user behavior. This is considered an advantage for these systems. Tags are also used as a bridge between users and resources to describe users’ interest in resources [11]. Researchers use a variety of strategies to gain the users’ interests and make recommendations with greater accuracy. In fact, one of the most important concerns in the field of recommender systems is to provide more accurate recommendations according to the users’ interests.

This article focuses on one of the major challenges of recommender systems, which is to improve the performance of recommender algorithms. To improve the performance, it has relied on modeling user’ interests and tagging clustering based on user tagging behaviors. Clustering and providing more accurate analysis we have used the community detection method. This is considered an important achievement. In fact, by examining the tagging behaviors of users more closely, using the suggested similarity criteria, forming a graph, and using the community detection method, we have paved the way to obtain users’ interests and finally, we have increased the strength of the recommendations with the nearest neighborhood method.

Tagging activities in folksonomies are not guided by any formal regulations (no dictionaries, no thesaurus) meaning that users can tag resources with any tags they like [12]. This leads to a wide variety of tags like inflections, spelling errors, abbreviations creative use of compounding, etc. We can interpret the tags in the folksonomy as concepts [13]. As a result, in tagging-based recommender systems, the main problems arise from discovering the meanings of tags. Due to the ambiguity in the meanings of the tags and lack of correct discovery of their meanings, the performance of these systems is affected. The co-occurrence tag method has been used for semantic communication in most previous studies. The number of studies, ontologies or external knowledge have been used to strengthen this method. In semantic theory, in order to find the relationship of tags based on external knowledge, an attempt is made to adapt them with meaning. Djuana Tjhwa et al. [14] they have tried WordNet concepts for deriving relationships. However, WordNet is a static resource, and only less than half (48.7%) of tags can match the direct study by Wu et al. [10] and described a domain ontology development approach that extracts domain terms from folksonomies and enrich them with data and vocabularies from the Linked Open Data cloud. As a result, this article obtains lightweight domain ontologies that combine the emergent knowledge of social tagging systems with formal knowledge from ontologies [15]. In general, it is difficult to choose the right concept that matches the tag due to the lack of tagging context. This is because the process of tagging users is very different from the lexicologists or domain specialists. This problem of separating the concept of tag is discussed by García-Silva et al. [15]. Even if a tag can be lexically consistent with a concept in external sources, the conformity of their intended meanings is unclear [16].

In this work, we presented a new method of collaborative filtering resource recommendation systems called social collaborative based on community detection with semantic and lexical connections of tags. It should be noted that when users employ tags for resources, these tags clearly show their preference and interest. By examining the interactions between users and tags, it is possible to understand the semantic correlation between resources and users, and also to extract users’ interests more accurately than recommending systems based on rating.

The main contribution of this paper can be explained as the following:(i)To the best of our knowledge, this is the first attempt leveraging semantic and lexical similarity of tags at the same time by considering the time of tag assignments parameter to construct graph of tags. These similarities are used to obtain the association strength of the tags.(ii)We apply community detection methods for clustering of tags. This leads to precise modeling of the interests of the users.(iii)This is the first work that used Ellenberg similarity criterion in recommendation phase for resource similarity. By using this criterion, in addition to the similarity of the resources, their differences are also taken into account.(iv)Based on two real-word datasets, we have conducted experiments to evaluate the effectiveness of the method. The results show that the proposed method outperform than the state-of-the-art recommendation methods achieved higher.(v)This work is different from the previous methods for a number of reasons, because we have not used any external linguistic resources such as WordNet or semantic resources (like ontology) and this makes the method stronger and covers most of the tags. On the other hand, WordNet or other external knowledge is maintained manually by experts and thus remains unchanged in long term. In fact, the low coverage of WordNet inevitably leads to the poor performance of the WordNet based on tag sense disambiguation methods. On the other hand, as in the previous methods, we have used the co-occurrence tag methods by considering the time of tag assignments parameter and lexical similarity to strengthen their communication.(vi)Another strength of the proposed method is the use of community detection method to analyze tags and find appropriate clusters of them. All these are to improve the quality of system performance.

The rest of the paper is organized as follows: Section 2 summarizes the work done in this area, Section 3 deals with the proposed method with subsections for generating tags graph, community detection, and resources recommendation stage, Section 4 describes performance evaluation of proposed solution and the results of the proposed method and finally, in Section 5, we have presented the conclusions and future works.

Nowadays, tag-based recommender algorithms are evolving rapidly. In general, tag-based recommender systems provide recommends to users by analyzing tags assigned to resources. In traditional recommender systems, especially CF, only two-dimensional data were used based on user resource rating and often with a rating resource user matrix, in tagged systems, that is, collaborative tagging, another dimension of information, namely, social tags, has been used as a powerful mechanism for making more accurate suggestions.

Although some studies has been perform on tagged recommendation systems, more research is still needed since there are many challenges in these systems. Many researchers have been trying to come up with solutions for better recommendations according to users’ interest. Some of these investigations have been somewhat successful, and some have been able to respond under certain conditions. In this section, we review some related studies.

Tso-Sutter et al. [17] used tag information as an additional source along with user rating information matrix in a content-based recommender system. In their work, they extended the user-item matrix to the user-item-tag matrix and used the Jaccard similarity criterion to find neighbors. However, due to the issue of tag quality, their proposed content-based method based on memory was not very successful in improving performance. Niwa et al. [18] made an effort to recommended web pages based on the analysis of tag used and degree of relationship between tags with users. However, in this works the accuracy of the recommendations was between 40% and 60% that was not a good result. The only advantage of their proposed method compared to similar methods was the reduction of complexity due to the lack of page browsing and the use of tags. Sen et al. [19] used a special tag ranking function to obtain user tag preferences. In addition, they used additional information such as search history and click streaming, which is difficult to use in real systems compared to other methods.

Some researchers have examined various aspects of tagged systems [11, 2024]. In field of recommendations, the information contained in these systems, has shown its significance in recommending resources, tags and users [21]. Goel and Kumar [11], the efficiency of tags in organizing the items to be encoded was examined. Article has studied the reasons for the effectiveness of social tagging systems [21]. Lamere [22], the authors examined the relevance of tags in music information retrieval. Golder and Huberman [24], the authors analyzed the structure and pattern of use of social tagging systems in Del.icio.us and compared the differences between collaborative and taxonomy tags.

Xu et al. [25] used an algorithm to recommend tags using the collaborative tagging information method. Their proposed algorithm considered the tags of a large number of users in the target document and tried to minimize the recommended concept overlapping tags to increase the level of coverage of small documents. Unfortunately, this method did not cover new documents. This is important for us to analyze in terms of tags, but we seek to recommend resources, that are different.

Zhang et al. [26], instead of analyzing tags, the authors used the features of the resources being tagged and combined them with the CF method to model user interest. This method identified implicit relationships that were absent in the traditional CF method. Determining the features of the resources was one of the problems that reduced the efficiency of the system.

Wu et al. [27] proposed the tag2word model based on a content-based method for determining the semantic relationships between the tags. Their method was able to reinforce the recommendations. It would not have worked properly if the tags had been used in the content of the documentation. This problem is obvious because many of the tags had been used by users are not in the content of the resources. Therefore, this method does not apply to all types of systems. According to the authors’ research, the used dataset gives better recommendations when the usage of tags in the titles or text of sources is high. This solution was presented in a content-based method in recommender systems.

de Gemmis et al. [28] combined semantic analysis of tags with a content-based approach. They were assisted in analyzing the meaning of the WordNet for disambiguation tags. In this approach, they combined the traditional content-based method with semantic analysis of tags and provided recommendations according to user interest. But the proposed method could not also be successful in disambiguating of tags.

Wartena et al. [29] used the idea of distributing co-occurring tags and proposed a tag recommendation system. In fact, they combined the CF method with the proposed idea. Their method did not succeed compared to the other methods. They proved that when the number of tags given to resources by a single user is higher, the proposed method works better.

Usually, if we want to examine the tag-based recommender systems in terms of the type of recommendation, these systems are divided into three categories: tag suggestions, resource, and user. The type of offering these systems do not really matter because all three categories make recommendations based on the tags [30].

Ignatov et al. [31] created profiles for radio stations and users from the tags of songs they listened to, and used the online release of tags to dynamically update profiles. Vall [32] and Zheng et al. [33] implicitly created tag-based profiles for music recommendations. Xie et al. [34] added emotions to user profiles and tagged resources. Fernández-Tobías and Cantador [35] proposed a way to extend user profiles and tag-based resources to build cross-domain recommendations.

In general, nowadays there are two main approaches in the field of tagged systems, which include approaches based on graph and content [36]. In the field of graph-based solutions, graph analysis methods can be used, which is one of the methods of community-based graph analysis. There is also another issue in tagged systems, and that is related to the methods of discovering and mapping meaning to the tag.

In this object, three methods have been proposed: (1) methods based on clustering, (2) methods based on ontology, and (3) hybrid methods that combine Techniques 1 and 2. Ontology-based methods are not suitable for determining the relationship between terms.

To achieve ontology-based sustainable systems, ontology building should be done by people having domain knowledge and not just by knowledge experts [37]. This is costly and time-consuming and these methods are used in the hope of solving the problem of semantic ambiguity when they could not solve the problem [38]. In addition, because these methods use external knowledge such as WordNet and Wikis, they can not completely cover the tags used and lead to increased workload without complete problem solving.

In most cluster-based methods, external knowledge sources such as WordNet and Wikis are used to determine the semantic relationships of tags as in ontology-based methods, which have the same problems mentioned in ontology-based methods in this category of solutions. A study on the Last.Fm dataset found that over 50% of the tags used were not covered by WordNet or any other traditional lexical resources [24]. Therefore, by examining the existing methods, we came to the conclusion that the simple and effective approaches many researchers use in catching semantics to folksonomy are based on mathematical and statistical formulas. Mathematical and statistical formulas play an important role. The best thing about them is that they were clear and unambiguous [39]. Therefore, using statistical and mathematical methods, the semantic and lexical relationship of tags can be determined. In the proposed solution, we did not use any external semantic sources such as ontologies or thesaurus, however we used accurate and formal methods in determining the semantic relationships of tags, which strengthen the proposed solution for managing a large number of tags in folksonomy. Because mathematical and statistical methods have good accuracy for extracting semantic and lexical relations of tags, they are suitable to be use in the proposed solution. After determining the semantic and lexical relationships of tags, we used an effective method in clustering tags called community detection methods. That is also one of the solutions in graph-based tagging systems. With community detection methods, more accurate analysis of relationships between graph elements can be provided. In general, it is possible to make more personalized suggestions in recommending systems by using community-based solutions, a good way to analyze networks. Thus, the quality of recommendations increases and this is the advantage of our proposed solution.

This work is different from the previous methods for a number of reasons, because we have not used any external linguistic resources such as WordNet or semantic resources (like ontology) and this makes our method stronger and covers most of the tags. On the other hand, WordNet or other external knowledge is maintained manually by experts and thus remains unchanged in long term. In fact, the low coverage of WordNet inevitably leads to the poor performance of the WordNet based on tag sense disambiguation methods. On the other hand, as in the previous methods, we have used the co-occurrence tag methods by considering the time of tag-assignments parameter and lexical similarity to strengthen their communication.

Another strength of the proposed method is the use of community detection method to analyze tags and find appropriate clusters of them. All these are to improve the quality of system performance.

3. Proposed Method

In this section, we examined the users’ tagging behaviors that could determine their interest. To achieve this aim, we used tags for resources and categorized them, determining users’ interests.

A social tagging system consists of a set of users (U), a set of tags (T), and a set of resources (R). We define these sets in Equation (1) as follows:where n is the number of users, m is the number of tags, and k represents the number of resources. In these systems, a folksonomy is defined as <U, R, T, Y>, where Y is a ternary relation between them, that is, [40]. Although there are various general datasets available for evaluating recommender algorithms, we chose Del.icio.us dataset to evaluate the work. Because the proposed method does not use any external thesaurus or ontologies, it supports other languages than English, so it is suitable for evaluating.

The proposed approach consists of two main phases. The first phase includes two steps: (1) creating a graph of tags and (2) identifying communities of tags. The second phase is to make recommendation based on the communities created from the tags and available resources in each community. In the following, we will explain the phases of the proposed solution.

3.1. Generating Tags Graph

As it was previously explained, the proposed solution includes two phases. The first phase includes two stages, the first which is the formation of tag graphs. Graph nodes of tags, and the weight of its edges are determined by the amounts of lexical, semantic similarity and the time of tag assignment. For example, the weight of two tags, ti and tj, is shown with w (ti, tj). After generating the graph, in the second stage, the tag communities are identified. In other words, the basis of this work is detecting communities of user tags and building communities of resources based on them. For each community of tags, a community of relevant resources and users are created. Finally, resource suggestions are recommended for the target user based on the probability of membership of each resource to the communities and the power of the local neighborhood. In fact, with this new method, it is possible to identify the interest of users accurately and provide precise recommendations. To create a graph, in the first phase, for determining the relationship between the tags, use their semantic and lexical similarity and the time of tag assignment. In fact, the first innovation of the proposed method is to determine the relationship of tags by a combination of semantic (considering the time of tag assignment) and lexical similarity and not by using foreign linguistic or semantic sources. As regards, social tags are very beneficial, but due to the nature of free-form tagging and the lack of explicit meaning in social tagging systems, there are many obstacles that may prevent the useful application of social tags [7].

One of the obstacles is syntactic variations. This means a word in different syntactic forms may be used in different tags. For example, one user may annotate a web resource with the tag “picture,” while another user may do this work with the tag “pictures.” (other examples are “web,” “web20,” “acm,” “acmi,” “acma,” and so on). Sometimes, words with the same meaning but very close syntax from different languages are used (e.g., “centre” and “center” and “absurd” and “absurde”). These changes must be considered to provide satisfactory performance; otherwise, they may lead to confusion [7]. These are the reasons that motivated the use of lexical similarity. Semantic and lexical similarities are used to obtain weights assigned to graph nodes (tags) and to show the strength between the nodes. This will result in accurate clustering of tags. In addition, this approach can manage a larger portion of the tags found in dataset. To obtain semantic relevance, the property of co-occurrence tags is used. However, unlike the previous methods, it takes into account the fact that users’ interests change over time.

As a result, when using the co-occurrence tags, it added the time of tag-assignment parameter. If the two co-occurrence tags are close to each other in the parameter, they will have a higher score and therefore, the power of semantic correlation will be higher. Jaccard similarity is used to find similarity of co-occurred tags. The formula is defined as follows:where R(ti) stands for the set of resources tagged by the ti tag. When two tags co-occur, first their semantic similarity is calculated with the Jaccard similarity formula. Then the lexical similarity with Levenshtein distance is calculated and called simLev. For calculating lexical relevance and morphological tags, we used high-threshold simLev criterion. This can resolve minor morphological changes as well as misspellings (these are two common problems with social tagging systems). Moreover, for tags that do not have semantic relevance but they have a strong lexical similarity, this lexical similarity is considered as the weights of the edges. The formula for simLev is defined as follows:

After obtaining both similarities, each lexical or semantic similarity that is larger, selected as the similarity between two tags. If two tags do not co-occur and the lexical similarity is greater than a threshold value of α, then the value is selected as a similarity between two tags. Since the users’ interests change over time, we considered another similarity based on the time of tag assignments for co-occurred tags. This similarity is shown with simtime(ti, tj). Suppose that Timestamp(tj, rk) shows the last time the tag ti is assigned to the resource rk. The set of the common resources for two co-occurred tags ti and tj, whose assignment is too close, is shown by nco (ti, tj). The formulas of and simtime(ti, tj) can be defined as follows:

Therefore, simtime is considered when two tags are co-occurred. Finally, a graph of the tags is created and the weight between two desired nodes calculated by Equation (6) as follows:

The pseudocode for generating the graph of tags is shown in Algorithm 1.

G = GenGraph(T).
Input: T: The set of tags of dataset
Output: G = (V, E), V are nodes and E are weighted edges in G
 V = {}
 E = {}
 for each ti∈ T do
  for each tj∈ T do
   Calculate w(ti, tj) according to Equation (6)
   Add ti, tj to V if not exist.
   Add an edge between ti and tj with weight w(ti, tj) to E
   end for
 end for

After generating a graph of tags, their communities are determined. In the following, the community detection algorithms and the reasons for using them in the proposed method will be explained.

3.2. Community Detection

The scope of social networks is known as a significant evolution in the last decade, and the community detection has emerged to analyze many fields as well as the individual’s interactions within social environments [41]. In this work, it has been decided to use this method to analyze tags. Therefore, the second stage of the first phase of the proposed approach is to detect tag communities. The best way to analyze the network of tags and to cluster them, is to use the community detection methods. The purpose of detecting communities is to extract groups whose internal communications of their communities are stronger and more powerful than the external communications. In fact, with this method, the existing divisions in a network can be identified and separated to get a better view of the structure of a network for its analysis. Various methods have been proposed for community detection. Here, community means a group of network nodes of tags that are tightly connected. The strength of the joints is obtained through their degree of similarity. In other words, the strength of the power connections shows the semantic and lexical similarity. In fact, the nodes belonging to the same community are similar and related to the same interest. The better the identified communities, the more accurate results are obtained in the recommendation section of the research system. Here, the criterion for distinguishing a good community is modularity, which is widely used in community detection methods. Modularity is defined by Newman and Girvan [42] in Equation (7) as follows (A higher modularity value indicates a stronger division of the network into communities):where w(ti, tj) is the weight of two nodes ti and tj. In the following is the degree of node ti defined in Equation (8). Also, has a value of 1 if both nodes ti and tj belong to the same community; otherwise, its value is zero. In Equation (7), m is the total weights of all edges in this graph, defined in Equation (9). In this research, the Louvain method has been used to identify the tags community. This method is nonoverlapping. After this step, the next part, which is the presentation of resource recommendation stage, will be explained.

3.3. Resources Recommendation Stage

In this section, after identifying the communities, the recommendation steps are explained. First C is defined as a set of communities which is detected by the community detection algorithm. Each community is a set of tags. These are defined in Equations (10) and (11) as follows:where stands for a function determining the community number of tags. In the following, for each resource a probability value is calculated for each community that indicates the probability of that resource’s membership in the desired community of tags which can be defined by Equation (12) as follows:

In Equation (13), N(tk, ri, cj) is the number of tags are used in the community cj to be tagged the resource ri, where it is possible to determine which communities are related to the resource. The higher the probability, the more relevant the resource is to that community. In other words, more tags from a community are used to tag the resource. In fact, by examining a user’s resources, it is possible to determine the user’s interests in various communities.

In this research, it was determined experimentally that the overlap of resource communities is high. The creation of resource communities through tag communities causes this high percentage. Due to the reduced accuracy of the recommendations. Therefore, at this stage, the resource communities are refined. According to Equation (12), the probability value of a resource to a desired community is obtained, which we can consider as threshold value. Therefore, the resources that are less than the threshold dependent are excluded from that community. In this way, the resulting communities will be more reasonable.

After determining the resources’ membership for different communities, in the next step, by Equation (14), which is the Ellenberg similarity criterion, the degree of similarity between two resources ri and rj is obtained as follows:where m is the sum of the probabilities of membership the two resources ri and rj in the common communities. b indicates the probability of membership the resource ri, and c represents the probability of membership the resource rj in different communities. By calculating the similarity between resources of the target user and resources of the specified communities of the target user, a list of recommended candidate resources can be obtained. More formally, let RTU and RTC be the set of the target user resources and resources of the target user communities. This list has two problems. The first problem is that there are too many resources in this list that have the same amount of similarity to the resources of the target user, and it is difficult to choose the exact resources recommended and close to the user’s interest. The second problem is that there are many unrelated resources to a reasonable degree of similarity in this list, if numerical similarity is sufficient, the expected result will not be obtained. To solve this problem, the other similarity has been used between two resources by Equation (15) as follows:

U(ri) stands for the set of users that annotated ri with tags. Then by Equation (16), resources with the most similarity to the resources of the target user are calculated as follows:where ri is the resource of the target user (ri ∈ RTU) and rj is the resource of the target community (rj ∈ RTC). Finally, a list of recommended resources is obtained by Equation (17) as follows:

The pseudocode for generating the recommended list is shown in Algorithm 2.

R_List = Generate_recommended_list (n_com, RTU, RTC, C)
Input: n-com = number of communities, RTU:set of the target user resources,
RTC: set of resources of the target communities, C = {ci|ci is community of the target user}
Output: R_List: list of recommended resources for the target user.
 L = {}
 R_List = {}
 for each ri ∈ RTC
  for each cj ∈ C
   calculate according to Equations (12) and (13)
   if then
    A(i, j) = 
  end for
 end for
 for each ri∈ RTU
  for each rj∈ RTC
   m = 0; b = 0; c = 0;
   for each k∈ n_com
    if (A(i, k) ≠ 0 And A(j, k) ≠ 0) then m = m + 
    else if (A(i, k) ≠ 0) then b = b +
    else c = c +
   Sime(ri, rj) = 
   calculate Simu(ri, rj) according to Equation (15)
   append (L, ri, ri, Simu(ri, rj) + Sime(ri, rj))
  end for
 end for
end for
  calculate Msr (ri) from L according to Equation (16)
   append (R_List, Msr(ri))
return (R_List)

4. Performance Evaluation of Proposed Solution

4.1. Experimental Dataset

One of the main parts of each recommendation system is the collection of information. If it were done in a regular and accurate manner, the analysis of data will be accomplished with great speed and accuracy [43]. In the proposed method, among the valid datasets that have been published, two datasets were used.(1)Del.icio.us: the highly used Hetrec2011-Del.icio.us-2k dataset by Zuo et al. [44] and Xu et al. [45] in the experiments, which includes 53,388 tags, 1,867 users, and 69,226 sources, which are gathered from Del.icio.us.com and released by Cantador et al. [46]. In this dataset, users not only can save and organize their favorite pages (URLs) but also tag and share them as they wish. Users are connected in a social network created from Del.icio.us interactions, and each user has its own tags, bookmarks and tag assignments.(2)Last.Fm: this is an artist recommendation dataset and gathered from music system Last.Fm (http://www.last.fm.com), which users are able to tag artists. Therefore, each user to artists has a list of tag assignments [47]. This dataset includes 11,946 tags, 1,892 users, and 17,632 artists.

In the beginning of using these datasets, we first removed the noisy and meaningless tags. Since there are some special characters and numbers in the dataset, these tags are noisy and meaningless. In the beginning of using the dataset, these tags have been removed and the data set has been cleaned. Python scripts have been used to perform the cleaning dataset.

Unlike most previous methods, the tags are used with any number of repetitions in this work. Therefore, the proposed solution is responsive to the cold start problem. Then, the test and trained dataset are specified as 20% and 80% of the total data. Recommendations are generated based on the known information in the training set, and then the test set is used to evaluate the performance of recommendation algorithms

4.2. Experimental Parameters and Baseline Methods

In the first step of the proposed method, which is generating a graph tags, a graph of all user tags was created. In creating this graph, Jaccard similarity and Levenshtein distance was used to determine the edge weight between two tags for co-occurring tags. In order to consider the lexical connection between them, Levenshtein distance was used and the greatest similarity was selected. If two tags co-occur, also the time of tag assignments are considered. For tags that do not co-occur, the lexical similarity (simLev) to α threshold is supposed. Here, in various experiments, we experimentally considered the lexical similarity threshold, α equal to 0.7 for co-occur tags; otherwise, its value is 0.8. If the lexical similarity were greater than this threshold, it calculated as 50% and applied as weight.

To show the efficiency of the proposed method, this method is compared with the following models:(1)CCS (clustering-based cosine similarity) method: the Cosine similarity method is based on clustering. Hierarchical clustering by Xu et al. [48] was used to model users and resources as a vector of cluster-based attributes, and content-based filtering is based on cosine similarity of recommendations. The proposed method is better than this method for several reasons. First, use the tags graph and create this graph in a powerful way. Second, the use of robust graph analysis, which is a method of community detection. The results of these two methods show the superiority of this research method.(2)ACF (autoencoder-based collaborative filtering) method: uses the CF method based on automatic encoder. An automated encoder is usually used to obtain summary introductions from user profiles based in which CF recommendations are used. The experiments on CF method with the different number of hidden layers demonstrate that deeper architectures can work better if the depth of the neural network is set appropriately [45].(3) CCF (clustering-based collaborative filtering) method or CF based on clustering: it is similar to CCS method but here the user-based CF method is used for recommendations [49].(4)PMF (probabilistic matrix factorization) method: this technique, which is based on the filtering user collaboration, uses a user ranking matrix. This model, based on the assumption that users who have rated similar sets of items are likely to have similar preferences [50]. The method was chosen to demonstrate the superiority of using another dimension of information, namely tags.

These two criteria have been significantly improved in the proposed algorithm, according to the known algorithms. The results of comparing the presented method with the proposed and known methods are shown in Table 1.

(5)KGAT: this is state-of-the-art knowledge-based model, performs knowledge-aware attentive graph convolution in KG for high-order modeling of relation [47].

4.3. Evaluation Metrics

The criteria most often used to evaluate recommender systems are Precession (P), Recall (R), and all of which used to evaluate the quality performance of recommendation systems. In fact, the criterion of Precession (P) determines what percentage of the set of recommenders is presented by a method is correct. This criterion measures the correctness and accuracy of the proposals recommended, as a result, the larger the criterion, the less errors in the method being measured. The next criterion, which is Recall (R), refers to what percentage of the offers are really users’ interest. According to Zhang et al. [26], P and R, are defined in Equations (18) and (19). Since users usually review the highest recommended items, we cut these criteria to a specific rank k. That is, just considering k the number of results at the top of the recommendation list, the precession in k with P@k and recall in k by R@k.

The experiments, the mean values of P@k and R@k were used to evaluate the performance of the system recommended by users. Where rr is the list of recommended resources and tr is the list of resources being tested. The higher value of these two criteria in different methods indicates their better quality. All experiments are implemented an Intel® Core i7 computer with 2.67 GHz CPU and 16.00 GB of RAM.

4.4. Experimental Results and Analysis

Two research questions have been raised in this section, and the experiments were designed to address these questions.

RQ1: How effective is the lexical similarity in the proposed method? To answer this RQ1, the proposed method has been compared with lexical similarity which is shown as LEXSEM_CDR. The method without this similarity is demonstrated by SEM_CDR. We examined them with using two metrics, Recall@k and Precision@k with four k values, 5, 10, 15, and 20. Table 1 represents the experimental results. These results show that lexical similarity enhances system performance by using semantic similarity. This improvement in output is especially evident when there are spelling mistakes in the tags. In these experiments, we experimentally considered the threshold value to be 0.7 for co-occur tags; otherwise, its value is 0.8. Because in assigning tags to resources, spelling mistakes are obvious, instead of spending time to clean them, using lexical similarity seemed very useful.

Tables 1 and 2 show the results that “Del.icio.us” dataset performs well compared to “Last.Fm” due to the diversity in tags by applying two similarities.

In the following, the second question will be raised.

RQ2: How effective is the time of tag assignment in the proposed methods? Due to the changes in users’ interests over time, we have considered the time difference of the last assignment of two co-occurred tags to generate a graph. Tables 3 and 4 present the results of the proposed method (CDR_TIME) and its comparison with LEXSEM_CDR. The results show that using the time of tag assignment approved accuracy of the recommender system.

In “Last.Fm” dataset, users’ tastes vary over time, so by applying the time parameter, compared to dataset “Del.icio.us,” the results change noticeably.

The results in Tables 5 and 6 show that the proposed method has significantly improved the two criteria of precision and recall. In addition to the improved results, the great advantage of the proposed method is that it uses only training data and does not use any external knowledge base or resource contents. The proposed method is agile and simple. As it is explained in the previous sections, tags do not have a specific format and users choose them without restrictions and this was the main reason for not using an external knowledge.

On the other hand, the reason why KGAT works more accurately with the increase in the number of recommendations is that, the latent relations are better extracted in this method. At the same time, the simplicity of the proposed method is considered an advantage over the KGAT method.

4.5. Threats to Validity

In this section, a brief list of internal and external threats related to the validity of research findings after various reviews is provided. The internal threats to the validity of the findings this research. The accuracy of the proposed method depends on quality of input dataset. In this work, after removing the noisy and meaningless data, all the data are divided into training and test data and the proposed method has been applied to it. By examining the results obtained on two datasets and comparing them with the state-of-the-art methods, it was concluded that this method has worked well. The values of the parameters used in the calculation of the similarity of the tags have been tried to select the best value by performing various tests, and the results of some of them have been randomly selected and checked manually. This review has confirmed the correctness of choosing the appropriate values.

The external threats to the validity of the findings this research: The ability to generalize the algorithm is one of the external threats. Therefore, the proposed method has been tested on two datasets. The results in both datasets confirm the superiority of this method. As a result, this method can be used in similar datasets without problems. The algorithm’s performance may depend on the quality and size of the dataset used to train and test the algorithm. If the dataset is biased or not representative, the resulting may not accurately reflect the relationships between tags, users, and resources. And this will affect the accuracy of the recommendation system. As usual, here the data is selected as 80% and 20% for training and testing, respectively. Tag usage behavior may change over time or in response to different situations, which may affect the accuracy of the resulting. To deal with this threat in the proposed method, we used the time parameter to consider the user’s behavioral changes during different times. The datasets used are static and their dynamicity needs further investigation, which was not specifically investigated in this research.

5. Conclusion and Future Works

In the method, the problems in tagged recommender systems and concluded that the performance of these systems is affected by semantic and lexical ambiguities. Various solutions have been proposed in this field, most of which suggested the use of external knowledge and thesaurus. Because the use of tags in some cases does not follow any specific rules, these solutions were not suitable especially in datasets such as Del.icio.us, where most tags are not covered by thesaurus. Therefore, by using co-occurring tags, the time of tag assignment and statistical and mathematical methods, we identified the semantic and lexical similarity of more accurate communications. The proposed method reached a suitable modeling of users’ interest from the community detection method. Based on this accurate modeling it achieved better results in providing recommendations. The results of experimental examinations also confirmed this. The performance of the proposed method has been done using two criteria of precision and recall based on evaluations with “Del.icio.us” and “Last.Fm” dataset. The evaluation results show that the precision and recall of the proposed method have significantly improved, compared to the other methods. According to the experimental results, the criteria of recall and precision have been improved, on average by 5% and 7%, respectively. In later studies, we plan to use more advanced community detection methods to cluster tags and get more accurate results and also will provide a plan to eliminate the semantic ambiguity of the tags.

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Disclosure

A preprint of the current research was previously published by Shokrzadeh et al. [51].

Conflicts of Interest

The authors declare that they have no conflicts of interest.