skip to main content
note
Open Access

X-distribution: Retraceable Power-law Exponent of Complex Networks

Published:27 February 2024Publication History

Skip Abstract Section

Abstract

Network modeling has been explored extensively by means of theoretical analysis as well as numerical simulations for Network Reconstruction (NR). The network reconstruction problem requires the estimation of the power-law exponent (γ) of a given input network. Thus, the effectiveness of the NR solution depends on the accuracy of the calculation of γ. In this article, we re-examine the degree distribution-based estimation of γ, which is not very accurate due to approximations. We propose X-distribution, which is more accurate than degree distribution. Various state-of-the-art network models, including CPM, NRM, RefOrCite2, BA, CDPAM, and DMS, are considered for simulation purposes, and simulated results support the proposed claim. Further, we apply X-distribution over several real-world networks to calculate their power-law exponents, which differ from those calculated using respective degree distributions. It is observed that X-distributions exhibit more linearity (straight line) on the log-log scale than degree distributions. Thus, X-distribution is more suitable for the evaluation of power-law exponent using linear fitting (on the log-log scale). The MATLAB implementation of power-law exponent (γ) calculation using X-distribution for different network models and the real-world datasets used in our experiments are available at https://github.com/Aikta-Arya/X-distribution-Retraceable-Power-Law-Exponent-of-Complex-Networks.git.

Skip 1INTRODUCTION Section

1 INTRODUCTION

Networked systems are ubiquitous in nature, for example, transportation networks [6, 14], social networks [13], biological networks [12], and communication networks [21, 22], which are analyzed using graphs or networks to understand their complex dynamics. In the past two decades, the problem of Structural Reconstruction of real-world networks has received a lot of attention. The structural reconstruction of a real-world network is concerned with the reconstruction of a given network by using both a network model and limited information about the network [1]. The reconstruction means that the generated network should possess the same collective spectral and structural properties as the input real-world network. In literature, various network-generating models have been proposed to understand and study the evolution process of real-world networks, which exhibit various patterns and properties of real-world networks, such as degree distribution, clustering, triangle formation, and small-world phenomena [2, 7, 12, 18, 21]. These proposed models are used to generate synthetic networks that look alike real-world networks and are broadly used to understand network evolution and dynamic processes taking place on these networks, such as influence propagation, opinion formation, anomaly detection, and so on.

The first very-well-known network model in this direction is the Barabási–Albert (BA) model [3], in which each new node makes \(\overline{k}\) connections with the existing nodes, and the probability of connecting with an existing node is directly proportional to its degree. This leads to the rich-get-richer phenomenon, and the degree distribution of the generated network follows a power law, i.e., approximated as \(p(k) = c \cdot k^{-\gamma }\). After this, there have been proposed several models, including fitness model [5], triad-formation model [15], local-world model [19], mutual attraction model [28], copying model [16], Network Reconstruction Model (NRM) [24], RefOrCite2 Model [25], Context Dependent Preferential Attachment Model (CDPAM) [23], Dorogovtsev, Mendes, and Samukhin (DMS) model [10], and so on [26]. All these existing network-generation models primarily focus on the network’s degree distribution so that the generated network follows the expected power-law degree distribution.

In the network reconstruction process of scale-free networks, estimating the power-law exponent of a given real-world network is required [24]. The novelty of network reconstruction solutions depends on the accuracy of power-law exponent calculation. Most of the state-of-the-art network models follow the power law if they use approximation, which may result in an error-prone estimation of the power-law exponent. The considered approximations in different models provide that model-generated networks follow power law in their tail only (high degree nodes) [9, 10, 23, 24].

Motivation: We consider the copying model (CPM) [16], in which nodes appear in a sequence one by one. A newly appeared node j selects an older (existing) node i uniformly randomly, and then j connects neighbors (via outgoing edges) of node i with probability p; see Figure 1. The power-law exponent for CPM is \(\gamma =1/p\). By setting \(p\in \lbrace 0.1,\;0.2,\;0.3,\;0.4,\;0.5,\;0.6,\;0.7,\;0.8,\;0.9, 0.12,\;0.15\rbrace\), we simulate networks of size \(n=10^5\). Using the degree distributions of simulated networks, the calculated values of \(\gamma\) are \(\lbrace 4.9,\;3.5,\;2.8,\;2.2,\;2.9,\;1.7,\;1.5,\;1.3,\;1.3,\;4.5,\;4.1\rbrace\) corresponding to the selected values of parameter p. But the expected values of \(\gamma\) for the selected values of p should be {10.0, 5.0, 3.3, 2.5, 2, 1.7, 1.4, 1.25, 1.1, 8.3, 6.7}. There is a significant deviation in the values of \(\gamma\) of simulated networks as compared to their expected values.

Fig. 1.

Fig. 1. Network evolution dynamics of CPM. A newly inserted node j at time t connects to an already existing node i with the given probability \(1/t\) and establishes links with neighbors of node i with probability p.

This motivates us to re-investigate the degree distribution for other models that are used for structural reconstruction. If degree distribution is not capable enough to be used for computation of the parameter \(\gamma\), then another metric or variable, similar to degree, is required to calculate \(\gamma\). Apart from that, in the literature, it is shown that the various network growth processes follow power-law degree distributions in the tail with the condition that the size of the network is very large [9, 10, 23, 24]. Thus, it is essential to define a new property to evaluate the value of power-law exponent \(\gamma\) of a given network more accurately, and it is expected to be more consistent with the change in the size of the networks. In this article, a variable \(X_i\) for node i is considered that is derived from the degree of the node and a constant. For various growing scale-free networks, \(X_i\) follows scale-free (power-law) distribution for \(X_i\gt 0\); in the case of their respective degree distributions, it follows scale-free (power-law) distribution for higher values (\(k_i\gt \gt \gt 1\)). A novel method for more accurate power-law exponent computation is proposed based on the distribution of X in a growing scale-free network under a given model or growth dynamics. The proposed method is compared with the degree distribution-based power-law exponent computation method proposed in Reference [8].

Contributions: This article makes the following contributions:

In this article, X-distribution (a derivative of degree) is defined, which is more accurate and consistent in calculating the power-law exponent of given networks.

Extensive experimentation over different state-of-the-art network models, including CPM [16], NRM [24], RefOrCite2 Model [25], BA [3], CDPAM [23], and DMS model [10], exhibits novelty of X-distribution. We also apply our proposed algorithm successfully to calculate power-exponents of X-distribution for various real-world networks and compare with the degree distribution-based method.

The rest of the article is organized as follows: Section 2 is dedicated to discussing the limitation of degree distribution and the definition of X-distribution. An algorithm is proposed to calculate the power-law exponent \(\gamma\) for a given network. In Section 3, X-distribution and degree distribution are applied to retrace the microdynamics (\(\gamma\)) of the networks obtained under CPM, NRM, RefOrCite2, BA, CDPAM, and DMS models. The comparative analysis of degree distribution and X-distribution indicates the superiority of X-distribution in the estimation of \(\gamma\) more accurately and consistently. Finally, the work is concluded in Section 4.

Skip 2X-DISTRIBUTION Section

2 X-DISTRIBUTION

Degree distribution to X-distribution: Here, we discuss the way we define X-distribution using the degree of nodes and its advantages over degree distribution.

We consider copying the model in References [4, 16] to explain X-distribution. Let us assume that \(k_i^{\text{in}}(t)\), \(k_i^{\text{out}}(t)\), and \(k_i(t)\) (\(=k_i^{\text{in}}+k_i^{\text{out}}\)) be the in-degree, out-degree, and degree of node i, respectively, at time t. The growth in the degree of node i can happen in two ways: either a new coming node j gets attached with node i with probability \(\frac{1}{t}\) directly (Figure 1), or node j first gets connected with one of the neighbors (nodes of incoming edges \(\mathcal {N}_i\)) of node i and then to node i with probability p (\(i.e.\) \(p \frac{1}{t}\)); see Figure 2. Thus, (1) \(\begin{equation} \dfrac{d k_i(t+1)}{d t}=\dfrac{1}{t}+\left(1-\dfrac{1}{t}\right) \sum _{l\in \mathcal {N}_i} p \dfrac{1}{t}=\dfrac{1+p k_i}{t}-\dfrac{p k_i}{t^2}. \end{equation}\) By mean-field approximation, \(\begin{equation} \frac{1}{p} \int \frac{d p k_i(t)}{1 + p k_i(t)} = \int \frac{dt}{t}. \end{equation}\)

Asserting boundary condition \( k_i(t_i) = k_i^{\text{out}}(t)= k_i^0\), \(\begin{align} \ln \dfrac{ k_i(t+1) p +1}{k^0_i p+1}&=p \ln \dfrac{t+1}{t_i}, \\ \frac{k_i(t+1)+1/p}{k^0_i +1/p}&=\left(\dfrac{t+1}{t_i}\right)^p. \end{align}\)

For \(k_i(t)\) to exceed k, we need (2) \(\begin{equation} t_i\lt (t+1)(k+1/p)^{-1/p} (k^0_i +1/p)^{1/p}. \end{equation}\)

Since nodes arrive uniformly, we have (3) \(\begin{equation} \Pr (k_i\gt k) \sim (k+1/p)^{-1/p}(k^0_i +1/p)^{1/p}, \end{equation}\) where \(\lim _{t \rightarrow \infty }k_i(t) \rightarrow k_i\).

Fig. 2.

Fig. 2. Node i also gets new connection when j gets connected with neighbors (incoming) of node i under evolution dynamics of CPM. A node j newly introduced at time t connects to an older base node \(i_4\) with probability \(1/t\) and then gets connected with one of the first neighbors (out-links only) of node \(i_4\) with probability p.

Thus, the degree distribution closely follows a power law with a dependency on the initial degree, and this dependency leads to approximation and more error in curve fitting while retracing the model parameters. To work around the initial condition, we consider a variable (4) \(\begin{equation} X_i =\dfrac{k_i(t+1)+1/p}{k^0_i +1/p} \end{equation}\) instead of degree \(k_i(t+1)\), the event \(X_i \gt x\) corresponds to \((t/t_i)^p \gt x\), or \(t_i \lt t x^{-1/p}\), implying that \(\begin{equation*} \Pr (X_i \gt x) = x^{-1/p}, \end{equation*}\) a perfect power law, and minimizes error in retracing the model parameters using curve fitting.

X -distribution: Now, we define (5) \(\begin{equation} X_i =\dfrac{k_i+\mathcal {C}}{k^0_i +\mathcal {C}}, \end{equation}\) where \(\mathcal {C}\) is a constant. So, the distribution of the variable \(X_i\) is called X-distribution. If we compare Equations (4) and (5), constant \(\mathcal {C}\) depends on model parameters. Due to the mean-field approximation made on Equation (1), \(C=\gamma =1/p\) as \(t\longrightarrow \infty\). Thus, for the networks of limited sizes obtained using model (1), the value of C can differ from \(\gamma\) and \(1/p\).

For different models that are working on the framework of the BA model, we can get constant \(\mathcal {C}\) and \(\gamma\) using the following comparative analysis: (6) \(\begin{equation} \begin{split} \dfrac{d k_i(t+1)}{d t}&=\dfrac{1}{\gamma }\left(\dfrac{k_i(t)+\mathcal {C} }{t} \right),\\ \Pr (X_i \gt x) &= x^{-\gamma }. \end{split} \end{equation}\)

If the growth equation of a model can be written in the form of Equation (6), then we can get the value of \(\gamma\) in terms of model parameters.

Equation (6) produces better approximation than Equation (3), thus an algorithm (Algorithm 1) is proposed to calculate \(\gamma\) more accurately using X-distribution. Algorithm 1 is divided into four blocks, namely, B(I), B(II), B(III), and B(IV). First block B(I) does the initialization of variable C, which varies from 0.001 to 50 in the interval of 0.01. For each value of C (for loop in line 5), values of \(X_i\) in block B(II) and the cumulative frequency of \(X_i\) (\(Y1_i\)) corresponding to unique values of \(X_i\) are calculated in B(III), and finally, linear fitting on the log-log scale and error estimation is done in B(IV) using MATLAB functions polyfit1 and polyval.2 Meanwhile, \(\gamma\) is the negative slope of the linear fitting (line 18 in Algorithm 1). Algorithm 1 reports \(\gamma\) (in line 20) corresponding to the minimum error.

Now, we consider a network dataset to understand the implementation of Algorithm 1. Process: In the first step, the algorithm does the calculation of \(X_i\) for all the nodes in the considered network for a value of constant C (Box B(II) in Algorithm 1), let \(C=4.9610\). Next, we calculate X-distribution (in Box B(III)), then we perform the linear fitting (on log-log scale) of the obtained X-distribution (in Box B(IV), fitting is shown in Figure 3(a)) and calculate the error in the fitting. Repeat the explained process Process for different considered values of \(C=0.001:0.01:50\). In B(IV) (lines 15–18) stores the values of error and corresponding power-law exponent if the error corresponding to the current value of C is less than previously explored values of C. Finally, after the completion of the execution of Algorithm 1, we obtain the power-law exponent \(\gamma =4.68\) corresponding to the best linear fitting (on log-log scale) of X-distribution (shown in Figure 3(b)). The errors in the linear fitting of X-distributions corresponding to different values of C are plotted in Figure 3(c).

Fig. 3.

Fig. 3. (a) For \(C=4.9610\) , X-distribution is calculated. Its linear fitting (on log-log scale) is done to calculate the power-law exponent, \(\gamma =2.8885\) . The error in fitting is 219.3640. (b) For \(C=19.7710\) , X-distribution is calculated. Its linear fitting (on log-log scale) is done to calculate the power-law exponent, \(\gamma =4.68\) . The error in fitting is 58.9. (c) Error in linear-fitting of X-distribution (on a log-log scale) of a real-world network (Supreme court) (in blue dots) is plotted for different considered values of \(C=0.001:0.01:50\) . The minima of the pattern (pink dot) is identified to get the value of C, which is expected to produce the best linear fitting (on log-log scale) of X-distribution, which is shown in subfigure (b).

Skip 3SIMULATION AND RESULTS Section

3 SIMULATION AND RESULTS

3.1 Data

Here, we consider the following network models to verify the superiority of X-distribution over degree distribution in calculating power-law exponent \(\gamma\) (Table 1): CPM [16], NRM [24], RefOrCite2 [25], BA [3], CDPAM [23], and DMS [10]. These are the state-of-the-art network models utilized for the structural reconstruction of real-world networks, in which at each time-step a new node appears and get attached to the older nodes according to the predefined rules of the respective models. The way to compute \(X_i\) for a growing network model is in Algorithm 1. We also consider various real-world networks, for example, Biomedical, Supreme court, ArxivTH, ArxivPH, Patent, and Facebook (refer to Table 2), and power-law exponents are calculated using X-distributions and respective degree distributions. The experimental computations are performed on Intel Xeon Gold 5120 dual CPU equipped with 128 GB RAM configuration system. Furthermore, Matlab implementations (using MATLAB R2022b software) of diverse network models are used to generate networks for experimental analysis.

Table 1.
\(\gamma\)\(\gamma (X)\)\(\gamma (D)\)
CPM1.111.108 \(\pm\) 0.03351.2533 \(\pm\) 0.0264
1.251.1918 \(\pm\) 0.01111.3406 \(\pm\) 0.0264
1.431.3633 \(\pm\) 0.01811.4938 \(\pm\) 0.0295
1.671.5739 \(\pm\) 0.02531.6930 \(\pm\) 0.0200
2.001.8918 \(\pm\) 0.04181.8559 \(\pm\) 0.0309
2.502.3679 \(\pm\) 0.07712.2452 \(\pm\) 0.0959
3.303.1549 \(\pm\) 0.13622.7707 \(\pm\) 0.1593
5.004.6118 \(\pm\) 0.42233.5281 \(\pm\)0.2973
6.676.349 \(\pm\) 0.92514.1296 \(\pm\) 0.4066
8.337.5686 \(\pm\) 1.56664.4651 \(\pm\) 0.4232
10.009.1658 \(\pm\) 1.77034.8878 \(\pm\) 0.5612
NRM1.331.4103 \(\pm\) 0.80300.8707 \(\pm\) 0.2851
1.721.9124 \(\pm\) 0.05882.1771 \(\pm\) 0.0499
2.272.3436 \(\pm\) 0.07462.3249 \(\pm\) 0.0895
3.573.8684 \(\pm\) 0.30823.2931 \(\pm\) 0.1313
5.265.6594 \(\pm\) 0.54904.0311 \(\pm\) 0.3263
10.2611.6199 \(\pm\) 3.16165.5043 \(\pm\) 0.7103
RefOrCite21.111.3226 \(\pm\) 0.00240.6909 \(\pm\) 0.0492
1.251.3762\(\pm\) 0.01241.0242 \(\pm\) 0.0638
1.431.4581 \(\pm\) 0.01031.3097 \(\pm\) 0.0637
1.671.5833 \(\pm\) 0.01651.5627 \(\pm\) 0.0645
2.001.9729 \(\pm\) 0.03211.8303 \(\pm\) 0.0855
2.502.5095 \(\pm\) 0.03032.1749 \(\pm\) 0.1282
3.303.4217 \(\pm\) 0.11902.6272 \(\pm\) 0.1875
5.005.2578 \(\pm\) 0.42493.2503 \(\pm\) 0.3531
6.676.7698 \(\pm\) 0.71153.6699 \(\pm\) 0.4457
8.338.3429 \(\pm\) 1.14674.1082 \(\pm\) 0.5815
10.009.9375 \(\pm\) 1.92044.1476 \(\pm\) 0.5886
\(\gamma\)\(\gamma (X,10)\)\(\gamma (D,10)\)\({\gamma (X,20)}\)\(\gamma (D,20)\)
BA2.01.9421.914 \(\pm\) 0.01161.96291.8985 \(\pm\) 0.0033
CDPAM1.001.4144 \(\pm\) 0.07751.3601 \(\pm\) 0.51030.7845 \(\pm\) 0.052.4621 \(\pm\) 0.1468
1.331.0478 \(\pm\) 0.04771.0417 \(\pm\) 0.07740.9 \(\pm\) 0.0050.8898 \(\pm\) 0.0436
1.601.4927 \(\pm\) 0.02501.8979 \(\pm\) 0.00391.5227 \(\pm\) 0.0152.2263 \(\pm\) 0.0818
1.821.7552 \(\pm\) 0.0051.8143 \(\pm\) 0.01581.8059 \(\pm\) 0.0031.8665 \(\pm\) 0.0073
1.911.8292 \(\pm\) 0.0051.8744 \(\pm\) 0.00781.8934 \(\pm\) 0.0051.9078 \(\pm\) 0.0046
1.991.9186 \(\pm\) 0.0051.9329 \(\pm\) 0.01081.9579 \(\pm\) 0.0051.9229 \(\pm\) 0.0050
\(\gamma\)\(\gamma (X,10)\)\(\gamma (D,10)\)\(\gamma\)\(\gamma (X,20)\)\(\gamma (D,20)\)
DMS2.011.9456 \(\pm\) 0.01751.9405 \(\pm\) 0.01142.011.9715 \(\pm\) 0.0122.2384 \(\pm\) 0.0756
3.002.8710 \(\pm\) 0.01752.2702 \(\pm\) 0.02672.502.3641 \(\pm\) 0.0011.8979 \(\pm\) 0.0031
4.003.9326 \(\pm\) 0.01752.4381 \(\pm\) 0.02883.002.87 \(\pm\) 0.011.8996 \(\pm\) 0.0032
12.007.65 \(\pm\) 0.01752.8706 \(\pm\) 0.03637.005.001 \(\pm\) 0.0101.9120 \(\pm\) 0.0026
  • The values in \(\gamma (X)\) column are calculated using the X-distribution and the values in \({\gamma (D)}\) column are calculated using the Degree distribution.

Table 1. Networks of Size \(10^5\) Nodes Under CPM, NRM, RefOrCite2, BA, CDPAM, and DMS Models Are Considered

  • The values in \(\gamma (X)\) column are calculated using the X-distribution and the values in \({\gamma (D)}\) column are calculated using the Degree distribution.

Table 2.
NetworksDescriptionNodesEdgesRef
BiomedicalConsists of biomedical papers indexed in NCBI (2001–2008)43,937162,404[25]
Supreme CourtUS Supreme Court cases (1754–2002). Judgements refer to previous judgements25,417446,490[11]
ArxivTHHigh Energy Physics—Theory papers from arXiv.org (1992–2002)27,770352,807[17]
ArxivPHHigh Energy Physics—Phenomenology papers from arXiv.org (1992–2002)34,546421,578[17]
PatentCitation network among U.S. Patents3,774,76816,518,948[17]
FacebookNetwork of posts to other user’s wall on Facebook46,952876,993[27]

Table 2. Brief Descriptions of Datasets with Nodes and Edges

3.2 Effectiveness of X-Distribution

This section discusses the experimental analysis of X-distribution through Algorithm 1 for various considered network models to show the effectiveness of the proposed X-distribution. For the simulation purpose and to cover the wide range of \(\gamma\), we set the parameter values of different models: parameter p in CPM is set to 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.15, 0.12, 0.1, \((p,\beta)\) in NRM are set to \((0.5,0.5),\) \((0.3,0.4),\) \((0.2,0.3),\) \((0.1,0.2),\) \((0.1,0.1),\) \((0.05,0.05)\), in RefOrCite2, \(p=0.4\) and q is set to 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.15, 0.12, 0.1.

In BA, CDPAM, and DMS models, the average degree (\(\overline{k}\)) is also an input parameter. We set \(\overline{k} \in \lbrace 10,20\rbrace\) for experimental simulations. \(\gamma (X,10)\) represents value of \(\gamma\) at \(\overline{k}=10\) for X-distribution. The CDPAM model set parameter \(\beta\) to 0.5, 1, 2, 5, 10, 1000 and \(\overline{k} \in \lbrace 10,20\rbrace\), in DMS model parameter \(\beta\) set to 0.1, 10, 20, 100 and \(\overline{k} \in \lbrace 10,20\rbrace\).

For the given settings of parameter values of different models, 100 networks are simulated of the size of \(10^5\) nodes. Then, power-law exponents \(\gamma (X)\) and \(\gamma (D)\) are calculated using X-distributions and degree distributions, respectively. The mean values with standard deviation are reported in Table 1. In Table 1, \(\gamma\) is the theoretical value corresponding to input parameters. The mathematical formulation for computing \(\gamma\) is given in Table 3. Furthermore, the numerically simulated values (\(\gamma (X)\) or \(\gamma (D)\)) close to theoretical \(\gamma\) correspond to a more accurate estimation of the power-law exponent. Values written in bold are closer to theoretical \(\gamma\). From Table 1, it is observed that X-distribution outperforms in most of the cases. The improvement is marginal whenever degree distribution exhibits improved results, but significant improvements are noticed in the case of X-distribution.

Table 3.
ModelCPMNRMRefOrCite2BACDPAMDMS
\(\gamma\)\(\dfrac{1}{p}\)\(\dfrac{1}{\beta +(1-\beta)p}\)\(\dfrac{1}{q}\)2\(\dfrac{2\beta }{\beta +0.5}\)\(2+\frac{\beta }{\overline{k}}\)

Table 3. Power-law Exponent ( \(\gamma\) ) of Different State-of-the-art Network Models

For pictorial verification, degree distributions and X-distributions of considered models are plotted in Figure 4 on log-log scale. X-distributions plots (in dark green hexagon) are more linear than respective degree distributions (plots in pink color squares). The extensive experimental results support the claim that X-distribution can estimate power-law exponent more accurately compared to respective degree distribution.

Fig. 4.

Fig. 4. (Best viewed in color.) X-distributions (X-dist) and Degree distributions (D-dist) for (a) CPM ( \(p=0.7\) ), (b) NRM ( \(p=0.4, \beta =0.2\) ), (c) RefOrCite2 ( \(p=0.4, \; q=0.2\) ), (d) BA ( \(p=5\) ), (e) CDPAM ( \(\beta =5,\overline{k}=5\) ), and (f) DMS ( \(\beta =10,\overline{k}=5\) ) models.

We also calculate goodness of fit (GoF) using cost function Mean-squared Error (MSE), and Two-sample Kolmogorov-Smirnov Test (K2) [20] to evaluate the quality of fitting of X-distribution and degree distribution of networks obtained under different network models (considered in Figure 4) and real-world networks (in Table 2), and noted in Table 4. Lower values (values in bold in Table 4) of GoF and K2 signify better curve fitting. From Table 4, it is observed that X-distribution of networks exhibit better curve fitting with lower values of GoF and K2 than corresponding degree distributions.

Table 4.
Data/ModelX-distributionDegree distribution
GoFK2GoFK2
Biomedical0.00840.04520.02350.0480
Supreme Court0.00260.01660.12120.0651
ArxivTH0.00290.02730.02100.0336
ArxivPH0.00850.04660.05540.0362
Patent0.00660.06140.01100.0375
Facebook0.00180.03070.15790.0791
CPM0.02270.01830.00600.0240
NRM0.00280.06030.01840.0339
RefOrCite20.00360.02500.06170.0500
BA0.01180.01930.02370.0433
CDPAM0.04850.06970.01590.0254
DMS0.00340.01250.03020.0368
  • Values in bold (lower values of GoF or K2) represent better performance.

Table 4. Goodness of Fit Values3(GoF) Using MSE (Mean-squared Error) Cost Function, and K2 Represents Test Statistic of Two-sample Kolmogorov-Smirnov Test 4[20] that Measures the Maximum Absolute Difference between the Cdfs of Two Input Distributions

  • Values in bold (lower values of GoF or K2) represent better performance.

3.3 Consistency of X-distribution

In this section, we discuss the stability and consistency of the proposed algorithm for calculating \(\gamma\). It has already been mentioned that most of the models follow the power law in their tail (for higher values of degree). Thus, the size of the network plays a critical role in the estimation of \(\gamma\). Here, two networks of different sizes (having \(10^5\) and \(10^6\) nodes) are generated using CPM, NRM, RefOrCite2, BA, CDPAM, and DMS models, and their X-distribution and degree distributions are plotted in Figure 5. The degree distribution of a model network follows the power law in its tail, and maximum deviation is observed in the tail. It may result in inaccurate computation of \(\gamma\). From the figure, it is observed that X-distribution is more stable and consistent with the growth of the network until the model changes its parameters.

Fig. 5.

Fig. 5. Consistency of Degree distribution and X-distribution with the growth of networks generated using (a) CPM ( \(p=0.7\) ), (b) NRM ( \(p=0.4, \beta =0.2\) ), (c) RefOrCite2 ( \(p=0.4, \; q=0.2\) ), (d) BA ( \(\overline{k}\) = 5), (e) CDPAM ( \(\beta =5\) , \(\overline{k}=5\) ), and (f) DMS ( \(\beta =10\) , \(\overline{k}=5\) ) network reconstruction models.

3.4 Verification on Real-world Networks

We also applied Algorithm 1 over several real-world networks, for example, Biomedical, Supreme court, ArxivTH, ArxivPH, Patent, and Facebook real-world networks (for descriptions refer to Table 2) and reported power-law exponents evaluated using X-distribution and degree distribution. The plots of distributions are available in Figure 6. We can clearly observe that the linearity (low values of GoF and K2 in Table 4) in the plots of X-distributions is better than the linearity observed in degree distribution plots. From Table 4, it is observed that X-distributions exhibit more linearity (low values of GoF and K2) on the log-log scale than degree distributions in most of the cases. Thus, X-distribution is more suitable for the evaluation of power-law exponent using linear fitting (on the log-log scale).

Fig. 6.

Fig. 6. (Best viewed in color.) X-distributions and Degree distributions for (a) Biomedical, (b) Supreme Court, (c) ArxivTH, (d) ArxivPH, (e) Patent, and (f) Facebook real-world datasets.

Skip 4CONCLUSION Section

4 CONCLUSION

In this article, the problem of retraceability of microdynamics of a growing network is considered, which has importance in network reconstruction. We propose the X-distribution to compute the model parameters more accurately compared to the networks’ associated degree distribution. Retracing the parameter values using X-distribution is successfully applied over the networks obtained under the BA, CP, NRM, RefOrCite2, CDPAM, and DMS models. The experimental results show that the X-distribution is more consistent with the growth of a network than the degree distribution. We also verified the effectiveness of X-distribution over degree distribution on various real-world networks.

In future, X-distribution would be applied to real-world data to analyze the universality of power law and reconstruction of real-world networks. One can further explore to propose better network reconstruction models using X-distribution. To retrace their microdynamics, we will further study the applicability of X-distribution for other real-world networks, including weighted networks, signed networks, and multi-layer networks. Additionally, we will investigate effectiveness of X-distribution in temporal and dynamic environment where node degree and connections change over time. Such investigations can provide insights of structural dynamics of real-world networks. Furthermore, the relation among X-distribution and community structure of the network is still left unexplored. Hence, we also keep this for our future work.

Footnotes

REFERENCES

  1. [1] Arya A. and Pandey P. K.. 2022. Structural reconstruction of signed social networks. In IEEE Transactions on Computational Social Systems 10, 5 (2022), 2599–2612. Oct. 2023, DOI:Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Arya A., Pandey P. K., and Saxena A.. 2023. Balanced and unbalanced triangle count in signed networks. In IEEE Transactions on Knowledge and Data Engineering 35, 12 (2023), 12491–12496. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Barabási Albert-László and Albert Réka. 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509512.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Bhat U., Krapivsky P. L., Lambiotte R., and Redner S.. 2016. Densification and structural transitions in networks that grow by node copying. Phys. Rev. E 94, 6 (2016), 062302.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Bianconi Ginestra and Barabási A.-L.. 2001. Competition and multiscaling in evolving networks. Europhys. Lett. 54, 4 (2001), 436.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Bohn Steffen and Magnasco Marcelo O.. 2007. Structure, scaling, and phase transition in the optimal transport network. Phys. Rev. Lett. 98, 8 (2007), 088702.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Chakrabarti Deepayan and Faloutsos Christos. 2006. Graph mining: Laws, generators, and algorithms. ACM Comput. Surveys 38, 1 (2006), 2.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Clauset Aaron, Shalizi Cosma Rohilla, and Newman Mark E. J.. 2009. Power-law distributions in empirical data. SIAM Rev. 51, 4 (2009), 661703.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Cohen Reuven and Havlin Shlomo. 2003. Scale-free networks are ultrasmall. Phys. Rev. Lett. 90, 5 (2003), 058701.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Dorogovtsev Sergey N., Mendes José Fernando F., and Samukhin Alexander N.. 2000. Structure of growing networks with preferential linking. Phys. Rev. Lett. 85, 21 (2000), 4633.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Fowler James H. and Jeon Sangick. 2008. The authority of Supreme Court precedent. Soc. Netw. 30, 1 (2008), 1630.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Girvan Michelle and Newman Mark E. J.. 2002. Community structure in social and biological networks. Proc. Natl. Acad. Sci. U.S.A. 99, 12 (2002), 78217826.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] González Marta C., Lind Pedro G., and Herrmann Hans J.. 2006. System of mobile agents to model social networks. Phys. Rev. Lett. 96, 8 (2006), 088702.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Guimera Roger, Mossa Stefano, Turtschi Adrian, and Amaral L. A. Nunes. 2005. The worldwide air transportation network: Anomalous centrality, community structure, and cities’ global roles. Proc. Natl. Acad. Sci. U.S.A. 102, 22 (2005), 77947799.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Holme Petter and Kim Beom Jun. 2002. Growing scale-free networks with tunable clustering. Phys. Rev. E 65, 2 (2002), 026107.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Krapivsky Pavel L. and Redner Sidney. 2005. Network growth by copying. Phys. Rev. E 71, 3 (2005), 036118.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Leskovec Jure, Kleinberg Jon, and Faloutsos Christos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. 177187.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Leskovec Jure, Kleinberg Jon, and Faloutsos Christos. 2007. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1, 1 (2007), 2.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Li Xiang and Chen Guanrong. 2003. A local-world evolving network model. Physica A: Stat. Mech. Appl. 328, 1 (2003), 274286.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Jr Frank J. Massey. 1951. The Kolmogorov-Smirnov test for goodness of fit. J. Amer. Stat. Assoc. (1951), 6878.Google ScholarGoogle Scholar
  21. [21] Newman Mark. 2018. Networks. Oxford University Press.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Onnela J.-P., Saramäki Jari, Hyvönen Jorkki, Szabó György, Lazer David, Kaski Kimmo, Kertész János, and Barabási A.-L.. 2007. Structure and tie strengths in mobile communication networks. Proc. Natl. Acad. Sci. U.S.A. 104, 18 (2007), 73327336.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Pandey Pradumn Kumar and Adhikari Bibhas. 2015. Context dependent preferential attachment model for complex networks. Physica A: Stat. Mech. Appl. 436 (2015), 499508.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Pandey Pradumn Kumar and Adhikari Bibhas. 2017. A parametric model approach for structural reconstruction of scale-free networks. IEEE Trans. Knowl. Data Eng. 29, 10 (2017), 20722085.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Pandey Pradumn Kumar, Singh Mayank, Goyal Pawan, Mukherjee Animesh, and Chakrabarti Soumen. 2020. Analysis of reference and citation copying in evolving bibliographic networks. J. Informetr. 14, 1 (2020), 101003.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Saxena Akrati. 2022. Evolving models for dynamic weighted complex networks. Principles of Social Networking: The New Horizon and Emerging Challenges (2022), 177208.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Viswanath Bimal, Mislove Alan, Cha Meeyoung, and Gummadi Krishna P.. 2009. On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM Workshop on Online Social Networks. 3742.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Wang Wen-Xu, Hu Bo, Wang Bing-Hong, and Yan Gang. 2006. Mutual attraction model for both assortative and disassortative weighted networks. Phys. Rev. E 73, 1 (2006), 016133.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. X-distribution: Retraceable Power-law Exponent of Complex Networks
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Knowledge Discovery from Data
              ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 5
              June 2024
              699 pages
              ISSN:1556-4681
              EISSN:1556-472X
              DOI:10.1145/3613659
              Issue’s Table of Contents

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 27 February 2024
              • Online AM: 30 December 2023
              • Accepted: 15 December 2023
              • Received: 9 October 2023
              Published in tkdd Volume 18, Issue 5

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • note
            • Article Metrics

              • Downloads (Last 12 months)174
              • Downloads (Last 6 weeks)79

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader