1 Introduction

The Internet of Things (IoT) is a new paradigm quickly gaining traction in today’s wireless communication environment [1,2,3,4,5]. This concept emphasizes the pervasiveness of objects in our environments, from home appliances and education to vehicles and transportation [6,7,8,9]. Radio-frequency Identification (RFID) tags, actuators, sensors, and mobile data are also included in this technology [10,11,12]. Sensing, connectivity, and item interaction are some key characteristics of IoT [10, 12,13,14]. The Industrial IoT (IIoT) incorporates current Information and Communications Technology (ICT) advancements into industrial manufacturing systems [15,16,17]. It essentially denotes the subsequently digitized interconnection of industrial manufacturing [18]. Data will be readily merged, correlated, compared, and incorporated into IIoT applications like smart cities, smart homes, and smart energy services, notwithstanding the heterogeneity of IIoT devices [19]. Base stations process data quickly. Each node with resource and energy limitations uses data aggregation from different sites [20, 21].

Also, the fundamental purpose of a data aggregation approach is to optimize energy consumption, network life, traffic bottlenecks, and data accuracy by aggregating and collecting data packets efficiently [22,23,24]. Removing duplication would reduce energy use and boost energy security [25, 26]. Data aggregation efficiency is determined by the network design and the magnitude of the data sensed. Due to the vast amount of sensed data, heavy network connections must be minimized [27, 28]. Thus, a severe issue in the IIoT is the aggregation problem, and it is an NP-hard issue [29]. So, this article proposes a method for data aggregation in IIoT that uses the artificial bee cloning algorithm, genetic operators, and density correlation degree; it is compared to the Reliable Spanning Tree (RST) construction algorithm, and the results are evaluated. Data aggregation is an essential network technique because it saves energy by reducing data transmission [30]. The goal is to create a spanning tree from an IIoT graph based on residual energy and high reliability. This approach aims to increase IIoT platforms’ residual energy and reliability and reduce distance and displacement probability.

A summary of recent research related to the planned investigation is presented in Section II. The system model is described in depth in Section 3. The study findings of the proposed approach are provided in Section 4. Finally, Section V provides a succinct summary of the entire assignment and a conclusion.

2 Related work

IoT data aggregation enables the efficient collection, analysis, and dissemination of large amounts of data from multiple sources. By improving data security, reducing energy consumption, and improving network performance, data aggregation plays a vital role in enabling informed decision-making and analysis in the IoT domain. So, Chandnani and Khairnar [31] presented a trust-based safe data aggregation approach and an energy-efficient secure routing protocol for data aggregation and routing in IoT.

Sajedi, et al. [32] introduced F-LEACH, a fuzzy-based data aggregation strategy, for IoT-enabled healthcare applications to maximize network longevity. Furthermore, the membership functions of the fuzzy inference system were optimized, and the average of numerous executions was chosen as the ideal parameter by adjusting the network scenario. Also, Zhu, et al. [33] developed over-the-air computation as a task-oriented method for wireless data aggregation by smoothly merging communication and computing.

In addition, Zhang, et al. [34] suggested a learning-based sparse data reconstruction approach that combines compressed sensing and deep learning. They aimed to minimize the data carried via IoT networks while maintaining reconstruction accuracy. A deep CS network was created using an end-to-end learning strategy to create a measurement matrix and an efficient and high-accuracy reconstruction network. Mo, Ahmed, et al. [35] suggested an energy-efficient, secure, and data-aggregated architecture employing blockchain technology for IoT devices to address the security and energy criteria. The proposed method used blockchain technology and data correlation reduction to safeguard IoT networks from fraudulent activity.

Previous approaches to data aggregation within the IoT domain have made commendable strides in enhancing security, maximizing energy efficiency, and optimizing network performance. However, these methods often face constraints in accurately computing trust values, adapting swiftly to evolving network conditions, managing heavy traffic loads, and requiring substantial data and computational resources. In contrast, the approach proposed in this paper strives to overcome these limitations by leveraging the artificial bee cloning algorithm and genetic operators. These techniques are utilized to create and refine spanning trees tailored for data transmission within the IIoT. By harnessing these advanced algorithms, our method endeavors to elevate reliability, curtail energy consumption, and significantly extend the lifespan of data aggregation within the IIoT landscape. This approach is poised to offer a more adaptive and resource-efficient framework that adeptly navigates the challenges posed by dynamic network environments, thereby fortifying the efficacy of data aggregation in industrial IoT settings.

3 Proposed method

In this section, the proposed method is described in multiple subsections. The following section explains the aggregation issue and its formal statement. The second subsection describes the system model. The DABCG-IoT algorithm, a method to solve the data aggregation problem using a hybrid algorithm, is presented in the third subsection.

3.1 Problem statement

This section formally addresses the fundamental challenge of data aggregation in the IoT context by conceptualizing IoT devices as constituents of a non-directionally connected graph G = (V, E). Here, V embodies a comprehensive collection of network nodes, encompassing diverse devices and pivotal base stations [36]. Correspondingly, set E encapsulates the intricate network of links interconnecting these nodes, establishing the crucial communication framework within the IoT environment [20].

$$V=\{v{}_{1}{},v{}_{1}{},\dots , v{}_{n}{}\}$$
(1)
$$E=\{e{}_{1}{},e{}_{1}{},\dots , e{}_{m}{}\}$$
(2)

In addition to the network architecture, each edge within this graph is endowed with p distinct positive real numbers, signifying a multitude of attributes attributed to it. These attributes serve as crucial metrics, delineating various parameters such as distance, cost, and other pertinent factors that intricately define the relationship and characteristics of the interconnected nodes. The determination of these attributes is systematically facilitated through Eq. (3), offering a quantifiable method to discern and assess the diverse characteristics imbibed within the network’s edges [37].

$$W{}_{i}{} =\left\{W{}_{1i}{} ,W{}_{2i}{} ,\dots ,W{}_{pi}{} \right\} , i=\left(\mathrm{1,2},\dots ,m\right)$$
(3)

Suppose \(x=\left(x{}_{1}{} x{}_{2}{} \dots x{}_{m}{}\right)\) is defined as follows:

$$\left\{\begin{array}{c}{X}_{i}=1 If\,the\,ei\,edge\,is \,elected\\ { X}_{i}=0 If\,the\,ei\,edge\,is\,not\,selected\end{array}\right.$$
(4)

Creating a Minimum Spanning Tree (MST) for IoT data aggregation is crucial. This tree optimizes connections between IoT devices represented as a graph, reducing overall communication costs while efficiently collecting and processing data. The MST offers an organized structure that minimizes latency and resource usage, ensuring efficient data aggregation and enhancing the overall effectiveness of IoT networks. Therefore, the spanning tree of graph G can be expressed by the vector x. Assuming x is the set of all vectors corresponding to the spanning trees in graph G, the problem of the minimum spanning tree can be expressed as follows:

$$min {z}(x)={\sum }_{i=1}^{m }w{}_{i}{} {x}_{i } (x\epsilon X)$$
(5)

Accordingly, the data aggregation problem aims to find the minimum spanning tree of graphs and IoT devices and aggregate data using this tree. In Eq. (5), the objective function \({z}(x)\) represents the heart of the MST problem in IoT data aggregation. This objective seeks to minimize the sum of weighted edges in the chosen spanning tree, where each edge's weight (\(w{}_{i}{}\)) captures attributes like distance and cost. Minimizing overall weighted edges is crucial for optimizing data aggregation efficiency in the IoT network. By constructing a minimum spanning tree, the algorithm efficiently connects IoT devices, reducing communication costs and enhancing resource usage. The objective function \({z}\left(x\right)\) provides a quantifiable metric, guiding the algorithm to select an optimal edge configuration that forms a minimum spanning tree, ensuring efficient data aggregation and resource optimization in the IoT network.

3.2 System model

This section examines a four-layer IoT model [38]. This model has a sensor, network, management, and application layers. A sensor network at the lowest layer includes sensors, actuators, and tags. IoT devices aim to create a reliable and energy-efficient tree in this layer. An extensive network, a mobile communication network, Wi-Fi, Ethernet, etc., can be observed at the network layer. Once the tree is created, data aggregation will be performed using this tree. IoT devices send data to the base station through the tree created, and the base station sends data to the management layer via the Internet to make the desired decisions and actions. In the management layer, some issues appear, such as data analysis, IoT device management, and security control. The simulation and implementation of the proposed algorithm in the paper are done in this layer.

The processes for allowing search and discovery for IoT resources, such as devices and objects, are discussed in this section. The main phases for developing IoT that can search for real-time resources are divided into three phases, each comprising two processes [39]. Step (1) Data preparation: Because IoT systems create datasets about objects' statuses and measurements, such datasets, or data streams, must be prepped for storage and indexing. IoT and base stations collaborate to produce datasets for fog-edge nodes, such as cloudlets. Step (2) Indexing: This is where IoT spiders, or crawlers, scan and analyze the provided data stream regularly, subsequently construct indexes, and plan crawling operations. The first level (lower level indexes) is built on fog-edge nodes, while the second level (higher level indexes) is built on the cloud. Lastly, Step (3) Searching involves end-users writing queries, which are then executed by IoT and returned in the form of prioritized lists of retrieved resources [40].

First, the construction of the random spanning tree and the encoding and objective function of the problem are described. The steps involved in constructing a random spanning tree entail several key processes. First, multiple spanning trees are generated from a given graph, and the tree with the lowest total edge weight is identified as the minimum spanning tree (MST). These trees, similar in vertex count to the original graph, exhibit one less edge and are devoid of cycles or loops. The proposed algorithm initiates the random generation and enhancement of these trees, optimizing their topology for effective data aggregation and network enhancement in IoT environments. This construction method plays a crucial role in optimizing data aggregation and network efficiency within IoT ecosystems. The process facilitates the creation of structured tree topologies, allowing devices to transmit sensitive data to adjacent nodes and base stations securely. These optimized tree structures substantially reduce energy consumption, enhance network reliability, and prolong the network’s lifespan. By systematically generating and improving spanning trees, the algorithm identifies the most efficient structures for IoT networks, achieving optimized data aggregation and overall network performance. The encoding of the spanning trees using binary arrays and matrices further aids in visualizing and evaluating the fitness of these structures based on criteria like displacement probability, energy utilization, and node offspring count, ultimately contributing to enhanced network longevity and reliability in IoT environments. Then, the proposed method for aggregating data in IoT nodes is described in some steps and detail with an example. Tree structure lets devices send sensitive data to neighbors and the base station. We can aggregate data, reduce energy use, improve reliability, and extend network life with a suitable tree structure. Several spanning trees should be created from the desired graph; the tree with the lowest total edge weight is the minimum spanning tree. The spanning tree can have the same number of vertices as the graph and one less edge. Trees should not have loops or cycles. The proposed algorithm generates and improves trees randomly. Figure 1a demonstrates the spanning tree T from graph G [41]. Also, generating multiple spanning trees from a graph is crucial for optimizing data aggregation and enhancing IoT network efficiency. The algorithm focuses on identifying the MST among them, minimizing total edge weight, and ensuring a loop-free structure. This approach significantly reduces energy consumption, improves network reliability, and extends network lifespan. The encoding method, using binary arrays and matrices, plays a vital role in visualizing and evaluating spanning tree fitness. The binary array efficiently represents tree topology, and the matrix offers a detailed view of node connections and exclusions. This encoding strategy enables rapid evaluation based on criteria like displacement probability, energy utilization, and offspring count. In summary, generating diverse spanning trees and employing encoding methods contribute to efficient data aggregation, energy optimization, and network longevity in IoT environments.

Fig. 1
figure 1

a Constructing a random spanning tree, b The presentation and encoding method, c The matrix encoding, and d The objective function of the proposed method

As illustrated in Fig. 1b, some examples of spanning trees are made of the desired graph. The encoding method creates a binary array equal to the number of graph edges. Encoding a spanning tree using a binary array and a matrix serves the purpose of structuring and optimizing the tree topology within IoT networks. This encoding method facilitates a systematic representation of graph edges, distinguishing their attachment to the tree or exclusion from it. The binary array, with 1 s denoting edges in the tree and 0 s representing those outside it, helps visualize the topology efficiently. Simultaneously, the matrix representation refines this by offering a more detailed depiction, indicating node connections and exclusions within the tree structure. These encoding strategies enable swift evaluation of spanning trees’ fitness based on specific criteria, such as displacement probability, energy utilization, and offspring count. Ultimately, this encoding mechanism aids in identifying and selecting the most efficient tree structures in IoT networks, optimizing data aggregation, energy consumption, reliability, and network longevity. The number of graph edges in the figure is 20; an array of 20 entries is created, representing an edge of the graph. The elements inside the array entries will be 1 or 0, where 1 indicates that the edge of the graph is attached to the tree, and 0 shows that the graph’s edge is not attached to the tree. The spanning tree is also encoded using the matrix as follows. For instance, in Fig. 1b, node N1 is connected to node N2. Therefore, 1 is placed in the corresponding entry in the matrix. In contrast, node N1 is not directly related to node N4; so 0 is placed in the corresponding entry. This process is repeated for each entry until the spanning tree is displayed as a matrix, as seen in Fig. 1c.

In graph and IoT-based networks, once the trees are randomly generated, they need the step numbers from each node to the base station and the number of offspring. After defining the above criteria for each tree, the objective function calculates each tree's fitness. It improves reliability and residual energy. The tk spanning tree will break if its internal nodes do not work correctly, i.e., if the nodes are more mobile or their energy is dissipated. The evaluation criteria for the fitness of the tk tree are as: Pk, Ek, and Hk.

3.2.1 Pk

It is the total displacement probability of the tree tk and is calculated as follows [40]:

$$P_{k} = \mathop \sum \limits_{{n_{i} \in v}}^{ } \left( {ch{}_{i}^{k} + 1} \right)p_{i}$$
(6)

Pi is the displacement probability of nI and a random number in the range [0,1]. The displacement probability of the base station is set to zero, which means that this node is assumed to have no displacement. Also, Chik represents the number of node offspring nI in the tree tk.

$$E_{k} = \mathop \sum \limits_{{n_{i} \in v}}^{ } \left( {ch{}_{i}^{k} + 1} \right)e_{i}$$
(7)

In the proposed network, the optimization technique is geared towards maximizing residual energy despite the challenges posed by diminishing energy levels. This technique encompasses a range of strategies, including energy-conscious routing to prioritize paths that minimize energy consumption during data transmission, equitable energy distribution to prevent individual nodes from exhausting their energy, dynamic power management that adjusts transmission power based on real-time conditions, and the implementation of energy-efficient protocols to minimize unnecessary overhead. Moreover, adaptive network structures are employed to facilitate efficient data routing. By strategically allocating resources, regulating power usage, and optimizing network configurations, our techniques aim to conserve energy, enhance efficiency, and ultimately maximize residual energy within our IoT network. This ensures prolonged device operation and sustains the overall network.

3.2.2 Average residual energy Ek

Calculating \({E}_{k}\), the average residual energy in IoT networks, gauges the energy left in devices post-data transmission. It assesses network energy efficiency by summing individual node energies (\({e}_{i}\)) in tree tk, where each \(ei\) ranges from 1 to 20 J. \({E}_{k}\) provides vital insights for optimizing energy management in IoT systems.it is the average residual energy used by devices to transmit aggregated data to the base station. Ek is the total residual energy of the tree tk and is calculated as follows [41]. Also, \(ei\) represents the residual energy of nI and is a random number in the range [1, 20] joules. The base station energy is assumed to be unlimited. chik represents the number of node offspring nI in the tree tk. So, the parameter \(ei\) indicating the energy of each node nI significantly contributes to calculating Ek, the average residual energy in IoT networks. Ranging between 1 and 20 J, \(ei\) represents the diverse energy levels of individual nodes post-data transmission. This range allows for a comprehensive assessment of overall energy efficiency, reflecting varying node capacities and aiding in optimizing energy management strategies.

3.2.3 Hk

Hop Count Distance (HCD) in the context of the tree tk refers to the total number of hops or intermediary nodes traversed by data packets as they move from individual nodes within the IoT network to the base station. It’s a measure of the distance or number of connections between a specific node and the base station in the given tree structure. It is the total HCD of the tree tk and is calculated as [41]:

$${H}_{k}={\sum }_{{n}_{i}\epsilon v}\left(h{}_{i}{}^{k}\right)$$
(8)

hiK indicates the HCD of node nI from the base station in the tree tk.

3.2.4 Reliability

In the context of IoT systems, reliability is crucial because it ensures the stability and continuity of networks, especially when faced with challenges such as device displacement or energy reduction. This stability is essential for maintaining uninterrupted communication and functionality within the network. This criterion is formally defined as follows:

$$Reliability=\frac{DFF}{TDFF}$$
(9)

DFF shows the number of devices that fail due to displacement or energy dissipation so that the spanning tree is not split; such failures always occur at leaf nodes. The TDFF criterion also indicates the total failures. The following algorithms investigate the obtained reliability.

3.2.5 Fitness

The evaluation criteria \({P}_{k}\), \({E}_{k}\), and \({H}_{k}\) play a crucial role in assessing the reliability and energy efficiency of the tk spanning tree in the fitness evaluation. \({P}_{k}\), represents the total displacement probability and is calculated based on the displacement probability of individual nodes in the tree. \({E}_{k}\) signifies the average residual energy utilized by devices for transmitting aggregated data to the base station, while \({H}_{k}\) measures the total HCD of the tree from the base station. These metrics collectively contribute to the assessment of the tree's reliability in the face of displacement or energy dissipation scenarios. Their calculation aids in understanding the tree's performance and its ability to remain stable while ensuring efficient energy utilization within IoT networks. Fitness as a linear combination of the above criteria is defined to minimize \({P}_{k}\) and \({H}_{k}\) while \({E}_{k}\) remains as high as possible. The fitness function serves to holistically evaluate the quality of a solution within IoT networks by combining criteria \({P}_{k}\), \({E}_{k}\), and \({H}_{k}\) and network reliability. Through a weighted linear combination, the function minimizes displacement probability and hop count distance while maximizing residual energy and network reliability, all normalized for fair comparison. This amalgamation aims to assess the stability, energy efficiency, and reliability of the spanning tree structure. By optimizing these factors together, the function provides a comprehensive measure of the tree's performance, facilitating the identification of solutions that strike a balance between stability, energy efficiency, and overall reliability in IoT networks. The fitness function in the proposed method is defined as follows:

$$Fitness = w_{1} \left( {\frac{{P_{k} - P_{min} }}{{P_{max} - P_{min} }}} \right) + w_{2} \left( {1 - \frac{{E_{k} - E_{min} }}{{E_{max} - E_{min} }}} \right) + w_{3} \left( {\frac{{H_{k} - H_{min} }}{{H_{max} - H_{min} }}} \right) + w_{4} (Re)$$
(10)

Also, in this equation, Pmin, Pmax, Emin, Emax, Hmin, and Hmax are used to normalize the expressions to keep them at the same level. The values Pmin, Pmax, Emin, Emax, Hmin, and Hmax are integral to normalizing expressions within the fitness function in the context of optimization. Pmin, Pmax define the lowest and highest feasible values for displacement probability, while Emin, Emax establish the boundaries for energy efficiency metrics. Similarly, Hmin, and Hmax represent the minimum and maximum achievable values for hop count distance. By normalizing these expressions with their respective minimum and maximum limits, the fitness function ensures a consistent and standardized evaluation across diverse scenarios or datasets within IoT networks. This normalization process allows fair comparison and optimization, facilitating the identification of solutions that strike a balance between reliability, energy efficiency, and performance in IoT networks. Besides, W1, W2, W3, and W4 are weights related to displacement probability, residual energy, and the number of steps. Figure 1d shows the objective function of the problem. After creating the spanning tree and calculating the collected values, the criteria of displacement probability, residual energy, and the distance between the number of steps and placement are shown in the objective function. Sensing, sending and receiving, and data processing consume energy. Sending packets uses more energy than receiving. Figure 2 shows a packet-sending and receiving radio energy model. The energy required to send a packet containing k bits to the d distance to receive k-bit information is given in the following equations [41].

$$\begin{array}{*{20}l} {E_{Tx} \left( {k,d} \right) = E_{Tx - elec} \left( k \right) + E_{Tx - amp} \left( {k,d} \right) } \hfill \\ {E_{Tx} \left( {k,d} \right) = E_{elec} *k + \varepsilon_{amp} *k*d^{2} } \hfill \\ \end{array}$$
(11)
$$\begin{array}{*{20}l} { E_{Rx \left( k \right)} = E_{Rx - elec} \left( k \right)} \hfill \\ {E_{Rx} \left( k \right) = E_{elec} *k} \hfill \\ \end{array}$$
(12)
Fig. 2
figure 2

Transmitter and receiver radio model

Also, Eelec is the energy required to send or receive a bit of information, and εamp is the energy required to amplify the transmitted signal over a distance. The energy models for packet-sending and receiving in IoT involve calculating the energy needed for transmission and reception of packets. For transmission, it combines electrical and amplified energy for '\(k\)' bits over distance ‘\(d\)’. For the reception, the energy required for \(k\) bits is solely electrical. Factors like transmission distance, packet size, and specific energy parameters influence these energy calculations.\({E}_{elec}\) and \({\varepsilon }_{amp}\) related to electrical transmission and signal amplification.

So, the four-layer IoT model constitutes the sensor, network, management, and application layers, each serving distinct purposes within the system. At the sensor layer, sensors, actuators, and tags collaborate to establish an energy-efficient and reliable tree structure for data transmission. Meanwhile, the network layer oversees diverse networks such as mobile communication, Wi-Fi, and Ethernet, pivotal in aggregating and transmitting data once the tree structure is established. IoT devices utilize this structure to relay data to the base station, which is subsequently forwarded to the management layer via the Internet, responsible for decision-making, data analysis, device management, and security control. This layer also serves as the arena for simulating and implementing proposed algorithms. These layers, namely the Sensor, Network, Management, and Application, intertwine synergistically, collectively facilitating efficient data aggregation and management within the IoT ecosystem.

Also, the process of searching for IoT resources significantly contributes to the overall functionality of IoT systems by enabling efficient resource discovery and utilization. This procedure, divided into phases of data preparation, indexing, and searching, streamlines access to real-time resources within IoT environments. Firstly, through data preparation, datasets containing crucial information about object statuses and measurements are readied for storage and indexing, allowing for the effective utilization of IoT-generated datasets. Subsequently, the indexing phase, where IoT spiders or crawlers analyze and construct indexes from the data streams, facilitates swift and systematic resource retrieval. These constructed indexes, spanning from fog-edge nodes to higher-level cloud infrastructure, enhance the accessibility and organization of available resources. Finally, the searching phase empowers end-users to execute queries efficiently, retrieving prioritized lists of resources, thereby optimizing the utilization of IoT resources for various applications. This streamlined resource discovery process enhances the functionality of systems by ensuring quick, targeted access to essential resources, thereby facilitating smoother and more effective operations [42, 43]. Also, the four-layer IoT model, comprising the sensor, network, management, and application layers, harmonizes a synchronized methodology to optimize data aggregation and management within the IoT ecosystem. At the sensor layer's nucleus, the collaborative synergy of sensors, actuators, and tags is directed toward the establishment of an energy-efficient and dependable tree structure for data transmission. This structured tree, once instantiated, serves as the cornerstone for efficient data aggregation in the network layer. Integral networks like mobile communication, Wi-Fi, and Ethernet assume a pivotal role in the aggregation and transmission of data facilitated by the established tree structure. IoT devices adeptly exploit this framework to convey data to the base station, thereby instigating subsequent transmission to the management layer through the Internet. In the management layer, the proposed algorithm undergoes rigorous simulation and implementation, systematically addressing pivotal challenges encompassing data analysis, IoT device management, and security control. This layer orchestrates decisions and actions, forming the bedrock for proficient data aggregation and streamlined network performance. The seamless integration of these layers ensures not only dependable data transmission but also confronts key challenges associated with data analysis, device management, and security, thereby amplifying the overall efficiency and resilience of the IoT ecosystem.

3.3 Aggregation method

The Aggregation Method, leveraging an artificial bee cloning technique, orchestrates the selection of optimal spanning trees crucial for IIoT data collection. Mirroring the roles of employed, onlooker, and scout bees, the method iterates through solution exploration, information sharing, and random exploration to pinpoint high-fitness solutions. Employed bees actively probe the solution space, relaying insights to onlooker bees who favor superior solutions based on fitness evaluations. Should solutions fall short, scout bees venture into random exploration. This process culminates in the selection of trees whose aggregated values signify their efficacy in data collection within IIoT environments. Through fitness-guided selections, the method ensures reliable data aggregation, seamlessly transitioning to alternative high-fitness trees in the event of device failure or malfunction, thereby ensuring robust and dependable data-gathering strategies. The method employs an artificial bee colony approach for data aggregation in IIoT, using three types of bees: employed, onlooker, and scout. Employed bees explore the solution space, extracting insights and sharing information about food resources with onlooker bees. Onlooker bees choose the best resources based on fitness evaluations and perform a neighborhood search. If employed bees fail to find new solutions, they transform into scout bees, seeking new random resources. This process iterates, combining genetic operators to generate new solutions until termination criteria are met. The employed bees actively probe the solution space, while onlooker bees select superior solutions. Scout bees explore randomly when solutions fail, ensuring constant search and adaptation until optimal solutions for data aggregation are achieved. So, algorithm adjustments are needed to improve the artificial bee cloning method’s efficiency and use it for IIoT data aggregation challenges [44]. Each point in the search space corresponds to a solution that artificial bees can exploit in the cloning technique. The sum of the aggregated values of the tree parameters indicates the solution's fitness. In a bee colony, there are three sorts of bees: onlooker bees, employed bees, and scout bees. The employed bee stays in the current solution and collects information about neighboring solutions in its memory. The onlooker bee receives the solution information from the employed bee and selects one solution; the scout bee is responsible for finding the new solution. The number of employed bees is the same as the number of onlooker bees, and the number of solutions is the same as the number of onlooker bees [45].

Employed bees use their discovered spanning trees and pass the information to bees using the bee dancing operator. Onlooker bees assess a spanning tree's fitness and choose one to exploit. The greater the fitness, the more likely it is to be selected by the onlooker bees. Employed bees whose solutions have been abandoned become scout bees. Scout bees search the entire environment randomly. These procedures are carried out repeatedly until the termination requirement is satisfied. Because the artificial bee cloning algorithm offers advantages in solving optimization problems, it is used as the basis for the proposed algorithm by applying genetic operators in the second phase of the algorithm. For data gathering, the method generates a set of trustworthy spanning trees. First, a tree with greater fitness is used to collect data. This tree remains usable until one device fails due to energy dissipation or excessive internal node displacement and failure. The tree is split in this scenario, and data gathering is stopped. As a result, the subsequent high-fitness spanning tree is utilized. This process is repeated until all of the trees have been utilized. The following are the steps in the suggested method:

The first step involves the utilization of a clustering method. The data density correlation degree plays a critical role in the data aggregation process for sensor nodes. This measure, established through two key definitions, delineates the core sensor nodes and quantifies the correlation between data, which is essential for effective data aggregation. Firstly, the definition of a core sensor node characterizes nodes whose data from neighboring sensors exhibit similarity within specified thresholds, ensuring a coherent representation of local data clusters. Secondly, the data density correlation degree for each sensor node is defined as a weighted function encompassing various factors like the distance between data points, their distribution, and thresholds. This degree evaluates the correlation level within sensor data, ensuring that collected information represents cohesive patterns while mitigating the influence of irrelevant or disparate data. The mechanism behind this correlation degree ensures that selected sensor nodes possess a cohesive representation of data, which is vital for accurate aggregation and subsequent decision-making [46]. The data density correlation degree clustering approach is described in this part. A sensor is termed the core sensor node if the data from a given number of neighbor nodes are similar [47].

Definition 1.

Core sensor node: Consider that there is a sensor node v, and it has n neighbor nodes\(, {v}_{1}, {v}_{2}, ..., {v}_{n}\); the data of \(v, {v}_{1}, {v}_{2}, ..., {v}_{n}\) are presented with\(D,{D}_{1},{D}_{2}, ...,{D}_{n}\). If there are m data in \({D}_{1},{D}_{2}, ...,{D}_{n}\) whose distances to data D are less than ε and minp ≤ m ≤ n, the sensor v is called the core sensor node, where ε and minp are the data threshold and amount threshold, respectively.

Definition 2.

Data density correlation degree: The data density correlation degree of a sensor node v is defined as follows [48]:

$$im\left( v \right) = \left\{ {\begin{array}{*{20}l} {0, } \hfill & {m < minp} \hfill \\ {a_{1} \left( {1 - \frac{1}{{\exp \left( {N - minp} \right)}}} \right) + a_{2} \left( {1 - \frac{d\Delta }{\varepsilon }} \right) + a_{3} \left( {1 - \frac{d}{\varepsilon }} \right),} \hfill & {m \ge minp} \hfill \\ \end{array} } \right.$$
(13)

where d is the average distance between the m data and the data D, and is the distance between the data center of m data and the data D. The\({a}_{1}\),\({a}_{2}\), \({a}_{3}\) are weight values and\({a}_{1}+{a}_{2}+{a}_{3}=1\). Suppose the data density correlation degree of sensor node v is (v) defined by Eq. 13. In that case, \(m(v)\) defined by Eq. 13, then we can obtain the properties of \(Sim(v)\) as (1) \(Sim(v)\) increases with the increase of\(N\), the number of data objects which are in the ε-neighborhood of\(D\); (2)\(Sim(v)\) increases with the decreases of\(d\Delta\), the distance between \(D\) and the data center of the data objects which are in ε-neighborhood of\(D\); (3) \(Sim(v)\) increases with the decreases of\(d\), the average distance between \(D\) and those data objects which are in the ε-neighborhood of\(D\); (4)\(Sim(v)\in [\mathrm{0,1}]\). These properties are consistent with our intuitiveness. In definition 2, the data threshold ε guarantees that \(Sim(v)\) will not be impacted by unrelated data. The amount threshold minPts is the minimum amount for sensor node \(v\) to represent some sensor nodes. To illustrate the validity of the data density degree defined by Eq. 13, let two-dimensional data objects for sensor nodes \({v}_{0}, {v}_{1}, {v}_{2}, ..., {v}_{n}\) are respectively\({D}_{0},{D}_{1},{D}_{2}, ...,{D}_{n}\). \({v}_{1}, {v}_{2}, ..., {v}_{n}\) are in the ε-neighborhood of\({v}_{0}\). \(Sim(v0)\) is defined by Eq. (13) [47].

The second step consists of the generation of food resources or primary spanning trees, as well as the initialization process. At this step, food resources or primary spanning trees are randomly generated. The initial population of trees is randomly generated from the desired graph; each member of this population and each spanning tree is a possible solution to our optimization problem. The tk tree is modeled using a binary array equal to the number of graph edges. To create the tk tree, |v|-1 elements are selected randomly from the array and are set to 1; the other elements are set to 0. It means that links from the graph connected in the tree are set to 1, and links that are not connected are set to 0. Selected links may form a loop. Therefore, the process must be repeated until the tree is obtained.

The third step involves the employed bee phase, during which employed bees move randomly toward food resources and search for neighborhoods. After producing the primary trees, the employed bees randomly move to a food resource or tree; after extracting the nectar and returning to the hive, they search for a neighbor hoping to find a tree with higher fitness. Similar to the initializing step, the solutions produced may lead to loop formation. So, the employed bees repeat this step until the result is the output of a tree. Because the employed bees explore the whole space of the problem in this phase, a better exploration will be obtained. For this reason, in the exploration phase, employed bees play a crucial role in efficiently searching the entire problem space for optimal solutions. Their random movement towards potential spanning trees ensures a broad exploration, increasing the likelihood of discovering high-fitness configurations for robust data aggregation in IoT networks. To prevent the formation of loops, a careful mechanism is in place—if employed bees encounter loops in the produced solutions, they repeat the process until a loop-free spanning tree is obtained. This iterative approach ensures the algorithm's resilience against loop-related challenges, contributing to the generation of diverse and reliable spanning trees for effective data aggregation in IoT networks.

The fourth step involves the exploration of neighborhoods to discover new food resources or spanning trees through the application of genetic operators, searching for neighborhoods, and producing new spanning trees by employed bees. The fourth step comprises four phases. The first phase involves the selection of a current food resource, as well as two random food resources. From these options, a parent chooses the best food resource, and the parents' food resources are matched randomly. Four food resources or spanning trees are selected as parents at this step. For instance, if the initial population is 100, the first tree is selected in the first iteration, the second tree is selected in the second iteration, and so on to create the current food resource. Also, two food resources or spanning trees will be chosen randomly, and one food resource will be selected as the best food resource and the best-spanning tree of the population. These four food resources or spanning trees are randomly matched as parent food resources, as shown in Fig. 3 [25]. The second phase of the fourth step involves the combination of the parents' food resources through the use of a two-point merger operator, resulting in the production of offspring. Matched food resources are combined using a two-point merger operator to produce up to eight offspring’s food resources.

Fig. 3
figure 3

Selecting the current food resource, random food resources, and the best food resource as parents

As illustrated in Fig. 4, the current food resource is combined with the random food resource 1; the random food resource 1 is combined with the random food resource 2; the random food resource 2 is combined with the best food resource, and the best food resource is combined with the current food resource using the two-point merge operator. The merging method is that two positions on binary vectors are randomly selected for a two-point merger operator, and then parts of the vectors are exchanged between these positions. After the two-point merging operation, there will be a repair step in our problem since the number of edges to create a tree and the number of 1 element is essential and should be one unit less than the number of vertices of the graph (|v|-1). If the number of 1 s is more, the additional 1 s will be removed randomly, and if the number of 1 s is less, 1 will be set equal to the required number of elements. The third phase of the fourth step involves the application of a mutation operator to the offspring, resulting in the generation of mutants. Also, swap is used for the mutation operator. A pair of elements 0 and 1 are randomly selected on the vector, and their values are changed. Mutations in 8 offspring are performed the same way, and 8 mutant offspring are produced, as shown in Fig. 5. The fourth phase of the fourth step involves the calculation of the fitness of the offspring and mutants' food resources, from which the best food resource (solution) is selected among the 16 new options based on their fitness. After applying the two-point merger operators, using the swap operator, and creating 16 offspring, the fitness of each of these offspring is determined by the population according to Fig. 6. So, in the fourth step of the artificial bee cloning algorithm, the two-point merger operator is pivotal for combining parent food resources, enhancing solution diversity, and optimizing data aggregation in IoT networks. This operator employs a strategic two-point crossover, randomly exchanging segments between binary vectors representing spanning trees. This process generates diverse offspring, contributing to the algorithm's adaptability. The two-point merger operator's significance lies in its ability to explore new neighborhoods effectively, potentially discovering improved configurations for data aggregation. By merging genetic information from different parents, the algorithm can navigate the solution space more efficiently, reinforcing its adaptability and overall effectiveness in optimizing IoT network data aggregation.

Fig. 4
figure 4

Merging parents’ food resources randomly with the two-point merger operator and offspring production

Fig. 5
figure 5

Applying mutation operator on offspring and producing a mutant generation operator and offspring production

Fig. 6
figure 6

Calculating the fitness of food resources of offspring and mutants

The fifth step involves the comparison of the new food resource to the current one, with the potential for replacement if it is superior. Additionally, the best solution is updated during this step. At this step, the best-selected neighborhood is compared to the current food resource or spanning tree. If it fits better (less) than the current food resource, the current food resource will be removed and replaced with a new food resource or spanning tree. Otherwise, the current food resource will remain unchanged, and one unit will be added to the trial index. Each iteration will also update the population's best solution or the best-spanning tree.

The sixth step involves the onlooker bee phase, during which the movement of onlooker bees toward food resources is facilitated through the use of a roulette wheel and a neighborhood search. After the employed bee phase is completed, the employed bee returns to the hive and provides the other bees with information on the food resources found. It is performed by bee dancing. By dancing in the dance area inside the hive, determining the distance of the new food resource from the hive, and determining the angle of the new food source from the sun, bees encourage other bees to follow them and go to the valuable food resource they find and extract nectar. This behavior of bees is simulated using a roulette wheel as follows. First, the fitness and probability of selecting each food resource are determined using Eq. (14) and (15) [49]:

$${fitness}_{i}=\left\{\begin{array}{c}\frac{1}{1+{fit}_{i}} {fit}_{i}\ge 0\\ 1+abs \left({fit}_{i}\right) {fit}_{i}<0\end{array}\right.$$
(14)
$${P}_{i}=\frac{{fitness}_{i}}{\sum_{j=1}^{SN}{fitness}_{i}}$$
(15)

In Eq. (14), the fitness of each solution is calculated; in Eq. (15), the chance of selecting each food resource is obtained; it is a number between zero and one, and the sum of the probabilities of all the answers is one. In this equation, the likelihood of selecting tree \(i\) is equal to the fitness value of tree i to the total fitness values of all trees or solutions. Given that the sum of the probabilities of the trees is equal to 1, we map the likelihood of selecting the trees to a vector of 1 and finally produce a random number between 0 and 1. If this number is in any range, that tree is selected. Because our search is performed using a roulette wheel and selection possibilities in the onlooker bee phase, we will have better exploitation; in fact, we will find the answer to the problem faster. The reason for using the two phases of the employed bee and onlooker bee in the artificial bee colony algorithm is to take advantage of both and to have an interface between exploration by the employed bee and rapid exploitation by the onlooker bee. The onlooker bee moves toward the current food source after using the roulette wheel. After collecting nectar and returning to the hive, it searches the neighborhood for a fitter tree. The only difference is that the current food resource is selected using a roulette wheel.

The seventh step involves waiting for information from the employed bees regarding the nectar yield of various food positions. Step seventh includes 2 phases: (1) Using the employed bee’s information, the onlooker bees choose food sources and make use of those locations; (2) The scout bees find new random food positions. The eighth step involves the comparison of the new food resource to the current one, with the potential for replacement if it is superior. Additionally, the best solution is updated during this step. At this step, the best-selected neighborhood is compared to the current food resource or spanning tree. If it fits better (less) than the current food resource, the current food resource will be removed and replaced with a new food resource or spanning tree. Otherwise, the current food resource will remain unchanged, and one unit will be added to the trial index. Each iteration will also update the population's best solution or the best-spanning tree.

The ninth step involves the scout bee phase, during which a random initialization of food resources or spanning trees occurs if no improvement or limit is observed. At this step, if the employed bees and the onlooker bees cannot find a new food resource in the neighborhood after several visits to it, that food resource is known as an abandoned food source. It means that the nectar and fitness of the solution are not appropriate there and are not worth searching for. So, the employed bee, whose food source is abandoned, becomes a scout bee and randomly selects a new food resource or solution, replaces the existing food resource, and removes the previous one.

The tenth and final step involves the repetition of steps three through nine until the termination criteria have been satisfied. Loop conditions can be the number of loops, the number of calls, reaching a specific answer, or the time; here, steps 3 to 9 will be performed as much as iterations require. Finally, the optimal food source or tree is selected for aggregation, and if the tree experiences failure due to IoT device malfunction or power loss, the next highest-ranking trees are utilized as an alternative. So, the algorithm creates a set of reliable spanning trees for data collection. First, the data collection is built on a tree with higher fitness. This tree remains usable until one device fails due to energy dissipation or failure. In this case, the tree is broken, and data collection is terminated. So, the next spanning tree with high fitness is used.

4 Simulation of the proposed method

In this section, the performance of the proposed algorithm for data aggregation problems is investigated in comparison to the RST-IoT and Spanning Tree construction algorithm based on HCD-IoT [50].

4.1 Experimental data and simulation parameters

MATLAB is used to simulate the proposed method. A computer with Windows 7, an Intel Core i5—2.5 GHz processor, and 4 GB RAM is used for all the tests.

4.2 Checking the best fitness values

The dimensions of the simulation environment are 120 m × 120 m, with 20 devices distributed randomly over the monitoring area. These devices are assumed to be heterogeneous, meaning that the initial energies and the probabilities of their displacement are not the same [51]. Table 1 shows the values of the parameters used.

Table 1 Shows the values of the parameters used

The comparison between the RST algorithm in IoT and the proposed algorithm employing artificial bee cloning with genetic operators indicates notable performance differences. Conducting twenty independent runs for each algorithm facilitated a comprehensive evaluation, utilizing mean values to assess their efficiency. The examination of the proposed algorithm focused on multiple parameters: reliability, energy consumption, displacement probability, and distance. The initial discussion encompassed the fitness values of constructed trees, which are pivotal for evaluating algorithm effectiveness. Figure 7a–d depicts the optimum fitness values for nodes 100, 200, 300, and 400, respectively. These figures starkly showcase the superior problem-solving capabilities of the proposed algorithm, highlighting its capacity to seek and produce enhanced solutions. The anticipated trend of increased execution time with the expansion of nodes was observed for the desired algorithm. However, the proposed algorithm maintained an acceptable execution time despite this escalation. Detailed execution times for both algorithms are documented in Table 2, further illustrating the efficiency of the proposed algorithm. Further analysis examined the average residual energy within an IoT system, quantified as the cumulative sum of all devices' residual energy. Figure 8a demonstrates the average residual energy for 200 nodes. Remarkably, the suggested approach surpasses the RST-IoT and HCT-IoT algorithms, exhibiting higher average residual energy, thus implying superior energy retention capabilities. In conclusion, the extensive evaluation showcases the proposed algorithm's superiority across diverse parameters, emphasizing its efficiency in generating enhanced solutions and preserving energy within IoT systems. These findings endorse the proposed algorithm's potential for widespread applications within IoT frameworks.

Fig. 7
figure 7

a Investigating the fitness of a tree with n = 100, b. Investigating the fitness of a tree with n = 200, c Investigating the fitness of a tree with n = 300, d Investigating the fitness of a tree with n = 40

Table 2 The values of the execution time of the algorithms
Fig. 8
figure 8

a Investigation of the average residual energy against the number of nodes. b. Investigating reliability against the number of broken nodes. c Investigating the reliability against the number of nodes. d Investigating the displacement rate in the suggested approach and comparing it to the other two methods

In the realm of IoT, ensuring data collection reliability stands as a pivotal concern. Through the conducted simulations, a meticulous examination was undertaken, encompassing DABCG-IoT, RST-IoT, and HCT-IoT, specifically focusing on their reliability metrics. Figure 8b was instrumental in illustrating the comparative analysis, accentuating changes in the number of failures from 10 to 50 while considering 20 nodes. This rigorous investigation into reliability revealed compelling insights. The outcomes distinctly portrayed DABCG-IoT as a superior performer among the assessed algorithms in terms of reliability. It showcased a notable enhancement, surpassing the RST-IoT and HCT-IoT algorithms by 6% and 28%, respectively. This improvement in reliability substantiates the robustness and dependability of the DABCG-IoT algorithm, positioning it as a promising solution for ensuring steadfast and consistent data collection within IoT frameworks. These findings significantly contribute to the understanding and evaluation of algorithms in IoT environments, highlighting the pivotal role of reliability, wherein DABCG-IoT emerges as a frontrunner, offering amplified reliability rates that can significantly bolster data collection processes.

Figure 8c provides an insightful view of the reliability trends within the considered algorithms amidst changes in the number of nodes. The impact of IoT network expansion on reliability is visibly depicted, indicating a reduction in dependability with network enlargement. Notably, the analysis indicates that DABCG-IoT showcases superior reliability, enhancing this aspect by 8% and 49% in comparison to the RST-IoT and HCT-IoT algorithms, respectively. This underlines the algorithm's robustness in varied network scales and reinforces its reliability compared to other models. In Fig. 8d, the likelihood of displacement rate concerning the node count is examined. This depiction accentuates a proportional increase in the displacement rate with the augmentation of nodes within the network. Interestingly, the outcomes emphasize the RST-IoT algorithm’s dominance over the proposed approach in this aspect while concurrently showcasing the proposed method’s outperformance against the HCT-IoT algorithm. This nuanced analysis signifies varying performance strengths across algorithms concerning displacement rates based on network size, underscoring distinctive capabilities among the considered models. So, the heightened reliability of the DABCG-IoT algorithm in Fig. 8c can be attributed to its adaptive design, featuring dynamic adjustments to changing network sizes. Key to its superior performance is a sophisticated redundancy strategy that mitigates the impact of node expansion, ensuring sustained dependability. Additionally, the algorithm employs intelligent data aggregation, minimizes communication errors, and showcases an 8% and 49% improvement over RST-IoT and HCT-IoT, respectively. DABCG-IoT's ability to maintain high reliability across diverse network scales highlights its adaptability and efficiency, making it a robust choice for IoT applications.

Table 3 provides a detailed exploration of the intricate relationship between communication delay and energy consumption in the DABCG-IoT algorithm, assuming a fixed number of nodes (50) and an initial energy consumption of 182.0 J for the first run. This initial energy value establishes a baseline for consumption, and the subsequent runs reveal a consistent and progressive increase in energy requirements as the delay parameter is elevated. Such insights are crucial for understanding how the algorithm responds to varying communication delays and for optimizing its performance in real-world IoT deployments. In the early runs (1–5), characterized by relatively lower delays ranging from 5 to 25 ms, the energy consumption experiences a gradual and moderate increase. This indicates a discernible sensitivity to lower delays, suggesting that the algorithm is responsive to the initial stages of communication latency. As the delay surpasses 25 ms (runs 6–15), the rate of increase in energy consumption becomes more pronounced, highlighting the algorithm’s heightened sensitivity to moderate to high communication delays. This underscores the critical trade-off between communication delay and energy consumption, emphasizing the need for careful consideration and optimization in real-world IoT deployments. The observed trends in the table underscore the importance of finding a balance between communication delays and energy consumption in the design and deployment of the DABCG-IoT algorithm. Designers and implementers must consider these trade-offs when aiming to achieve efficient data aggregation while navigating the constraints imposed by communication delays. Future work could explore optimization strategies, such as adaptive algorithms or dynamic parameter adjustments, to enhance the algorithm's resilience in varying network conditions and maintain energy efficiency. It is important to note that while the table provides valuable insights, a more rigorous analysis involving a larger dataset and statistical measures would further enhance the robustness of the findings. Additionally, considerations for real-world deployment should take into account other factors, such as reliability and execution time, providing a more comprehensive evaluation of the algorithm's overall performance. In conclusion, the detailed exploration of delay and energy consumption contributes essential knowledge for the continued development and optimization of IoT algorithms, ensuring their effectiveness in dynamic and diverse IoT environments.

Table 3 The correlation between delay and energy use

So, Table 3 details the nuanced relationship between communication delay and energy consumption in the DABCG-IoT algorithm. The observed progressive increase in energy consumption with rising communication delays, especially in the moderate to high range (runs 6–15), emphasizes the algorithm's sensitivity. Early runs (1–5) show responsiveness to lower delays, unveiling a crucial trade-off between communication delay and energy efficiency. Optimization for real-world IoT deployments necessitates a delicate balance. Future exploration into adaptive algorithms or dynamic parameter adjustments may enhance adaptability to variable network conditions. While insightful, the findings could benefit from a more extensive dataset and statistical scrutiny. A holistic evaluation should extend considerations beyond energy consumption to encompass factors like reliability and execution time in dynamic IoT environments.

5 Conclusion and future work

In this paper, we proposed a method for data aggregation in IIoT that involves the use of the artificial bee cloning algorithm and genetic operators to generate a set of reliable spanning trees for data gathering. The method involves several steps, including clustering nodes based on the data density correlation degree, initializing and producing primary spanning trees, and employing bees to search for neighborhoods and find new spanning trees using genetic operators. Onlooker bees use a roulette wheel to select food resources and search for neighborhoods, and scout bees are used to find new random food resources if no improvement is found. The process is repeated until termination criteria are met, and the best-spanning tree is selected for data aggregation. If the tree fails due to device failure or power dissipation, the next best tree is used. This method is designed to improve the efficiency of data aggregation in IIoT by using the artificial bee cloning algorithm and genetic operators to generate and improve spanning trees. It aims to improve reliability, reduce energy consumption, and extend the network lifespan by creating a suitable tree structure for data transmission. Some benefits of the proposed method include its ability to find good solutions to the optimization problem efficiently and effectively, its ability to search for and explore a wide range of possible solutions, and its ability to adapt and improve solutions over time through the use of genetic operators. The proposed method's effectiveness depends on termination criteria and static network assumptions, limiting adaptability and scalability. Reliance on immediate backup trees affects fault tolerance, while algorithm choice and parameter tuning impact robustness across IoT scenarios. Integrating learning techniques lacks exploration of complexities like model compatibility and data issues.

As a future challenge, this issue can be solved by other evolutionary algorithms, such as particle swarm optimization [52] and multi-objective bat algorithm [53] with the genetic or colonial competition algorithm. One potential avenue for further research would be to integrate the proposed method with machine/deep learning techniques [54] to enhance its effectiveness in aggregating data in IoT networks.