Abstract

In recent few years, flying ad hoc networks are utilized more for interconnectivity. In the topological scenario of FANETs, IoT nodes are available on ground where UAVs collect information. Due to high mobility patterns of UAVs cause disruption where intruders easily deploy cyberattacks like DoS/DDoS. Flying ad hoc networks use to have UAVs, satellite, and base station in the physical structure. IoT-based UAV networks are having many applications which include agriculture, rescue operations, tracking, and surveillance. However, DoS/DDoS attacks disturb the behaviour of entire FANET which lead to unbalance energy, end-to-end delay, and packet loss. This research study is focused about the detail study of machine learning-based IDS. Also, cognitive lightweight-LR approach is modeled using UNSW-NB 15 dataset. IoT-based UAV network is introduced using machine learning to detect possible security attacks. The queuing and data traffic model is utilized to implement DT, RF, XGBoost, AdaBoost, Bagging and logistic regression in the environment of IoT-based UAV network. Logistic regression is the proposed approach which is used to estimate statistical possibility. Overall, experimentation is based on binomial distribution. There exists linear association approach in logistic regression. In comparison with other techniques, logistic regression behaviour is lightweight and low cost. The simulation results presents logistic regression better results in contrast with other techniques. Also, high accuracy is balanced well in optimal way.

1. Introduction

Integration of 5G wireless networks with FANETs is a new concept which uses to improve coverage and reduce delay [13]. Mobile ad hoc network is considered the primary idea where VANET and FANET are emerged. UAV swarms or group collectively make FANETs [4]. There can be either signal or multi-UAVs system. Initially, UAVs are only utilized to collect data from ground IoT nodes [5]. But, nowadays, aerial vehicles have changed the dynamics of every human which include smart farming using UAVs, rescue operations, border surveillance, and many more.

In comparison with other traditional fields, FANETs are very much cost low and can be deployed everywhere. The high mobility patterns of UAVs limit energy level in entire network. Due to wireless connectivity in FANETs, internet of things plays an important role. Although, there exist two ways of communication which consist of a2a (air-to-air) and a2g (air-to-ground) [6]. Recently, Zigbee (IEEE 802.15.4) is introduced in FANETs for secure and long-range communication. Mobile UAV pattern effects quality of service (QoS) in the field of IoT-based FANETs. In the conventional UAV network, there exist satellite, ground base station, and UAVs [7].

FANET network needs to be secured from cyberattacks which reduce connectivity in between nodes and interrupt communication. False data attack is one of the dangerous threats during remote patient surgery or operation [8]. However, DoS/DDoS security attacks can be easily detected with the help of the intrusion detection system. Various research studies formulate that identification of cyberattacks in FANETs is considered a major problem [9]. Intruder/attacker UAVs can be used to steal data and jam potential links [10, 11]. Therefore, a proposed system model will consists of detecting ongoing cyberattacks like DoS/DDoS and ping of death which is referred to as dynamic-IDS. This research study will only expand to simulate detection of attacks in FANETs. Furthermore, topological arrangements of FANETs are shown in Figure 1. The main points of the research paper are as follows:(i)Machine learning algorithms such as DT, RF, XGBoost, AdaBoost, Bagging and logistic regression are utilized(ii)UNSW-NB 15 dataset is used for training and testing data(iii)Cognitive lightweight-LR approach is proposed to detect attacks(iv)Detailed comparative analysis is formulated using machine learning techniques

Major contribution points of this study elaborate the concept of machine learning algorithms which use to detect possible cyberattacks. Comprehensive study is evaluated to understand previous ideas and compare them with the proposed solution. UNSW-NB15 dataset is utilized for experimentation and performance analysis of machine learning classifiers.

Figure 1 illustrates the concept of UAV network using the concept of intruders. When unmanned aerial vehicles tries to collect data from IoT ground nodes at the same time attackers use to deploy fake data packets which leads miss information. Also, FANET network is presented which use to have base station, satellite, and UAVs.

Apart from that machine learning techniques are used in IoT, ad hoc networks, software define networks, and many other fields. Therefore, in machine learning data set is utilized which use to have detailed data for the specific area. Classifiers or algorithms are trained properly to evaluate the performance.

The rest of the article is structured with Section 1 which consists of the study introduction where Section 2 is composed of brief literature having past data about the problem. Similarly, machine learning algorithms in Section 3 and Section 4 represent the proposed model. Section 5 demonstrates simulation results. The theoretical analysis and future direction is discussed in Section 6, which is explained in the conclusion section.

In the literature section, limitations regarding traditional IDS in other fields are discussed as follows.

Initially, IDS was designed for MANET, VANET, WSN, and IoT networks which use to be vulnerable to cyberthreats such as sinkhole, DoS/DDoS, and PoD. Sometimes, inside the network, attack is initiated which is commonly called sinkhole. While, due to DoS/DDoS security attacks the other neighbor nodes become unavailable for legitimate user. Abdollahi and Fathi implemented a novel IDS for internet of things to identify abnormal data packets. Furthermore, false alarm and missed detection should be reduced which cause issues in network [12].

Real-time IDS can capture abnormal live data packets in contrast with offline. KDD cup 99 data set is commonly utilized in machine learning algorithms to detect damaged caused because of cyberattack [13]. Therefore, real-time IDS are needed for recently emerged technology FANETs.

Identification of attacker through IDS is widely used approach. Therefore, network-IDS usually collect data from network through monitoring traffic. While, different signs of intrusion and alert messages need detection otherwise IoT network level becomes slow down. Deep learning algorithms using KDD cup 99 is simulated through normal, DoS, Probe, R2L, and U2R where high accuracy is examined. FANET is low cost but intrusion can be happen quite easily due to high mobility. Moreover, this study elaborates intelligent intrusion detection framework for UAVs. Authors proposed signature-based IDS for FANETs [14].

Flooding attacks slow down entire process of FANET networks. UAV-IDS-2020 is utilized which use to have unidirectional and bidirectional flow in the data traffic management [15]. Table 1 presents various IDS in UAV networks. In addition, information in Table 1 is mostly about signature-based intrusion detection system. Various areas of studies are conducted to identify cyberattacks. Also, advance datasets are utilized for experimentation of different machine learning techniques.

3. Machine Learning Classifiers/Algorithms

Machine Learning is a term used in the computer science branch that considerably desires to allow computers to “understand” without being instantly programmed [21, 22]. Computers “understand” in machine learning by enhancing their implementation at assignments through “background.” In general, “background” usually implies suiting to information; therefore, there is not an exact border among machine learning and statistical techniques [23]. Machine learning techniques have demonstrated significant assurance in furnishing answers to complicated issues [24]. A few of the applications we employ every day from exploring the Internet to recognizing the speech are the instances of enormous strides created in recognizing the assurance of machine learning [25]. Machine learning have two categories: first is supervised learning and second is unsupervised learning. These two categories will wrap all the combination of classification, and techniques of clustering [26]. Supervised learning strategies enclosed combination of various base classifiers; whereas, unsupervised learning strategies enclosed anticipation maximization algorithms as well clustering techniques. In addition, machine learning techniques are used in different field of studies to improve overall performance.

3.1. Decision Tree

Machine learning is the method of learning or dragging unique designs from extensive data sets by applying techniques from artificial intelligence. Category and forecast are the strategies employed to make out essential data categories and indicate a probable trend [27]. The decision tree is an essential category approach in the machine learning classification. It is typically employed in commerce, management, and detection of fraud [28]. As the typical approach of the decision tree, ID3, C4.5, and C5.0 methods have the values of increased organizing rate, powerful learning capability, and straightforward structure. Yet, these methods are also insufficient in a functional application [29]. When utilizing it to categorize, there exists the issue of bending to select features that have more weight and managing features that have fewer weights. Decision trees are amazing techniques to enable anyone to determine the most suitable method of activity [30]. They develop a favorably beneficial arrangement in which one can set choices and investigate the potential consequences of those choices. A decision tree is employed to describe graphically the findings, the possibilities, and the results related to conclusions and occurrences [31].

3.2. Random Forest

Random forest is a unique approach in the field of machine learning that solves many complex issues [32]. Random forest is a mixture of a sequence of tree network classifiers. This approach has numerous useful features and has been significantly employed in the categorization, forecasting, and regression process [33]. Corresponding with the classic approaches random forest has numerous useful integrities; thus, the extent of the application of this unique approach is extremely comprehensive [34]. It is one of the most suitable learning approaches. Generally, this technique is a regression-tree approach that employs bootstrap collection and randomization of forecasters to acquire an increased extent of predictive accurateness [35]. The principal disadvantage of this unique approach is that an enormous number of trees can make the approach slow and inadequate for real-time forecasts. Generally, these approaches are quick to prepare, but a little slow to make forecasts once they are prepared [36].

3.3. Extreme Gradient Boosting

The XGBoost is a brief name for the extreme gradient boosting technique. It is a unique approach that is also known as a tree-based strategy that poses beneath the supervised component of the machine learning domain [37]. Although it can be employed for both categorization as well as regression issues, all of the instructions and illustrations in this technique guide the algorithm’s service for categorization only [38]. It is an important and scalable performance of gradient enabling framework. It sustains diverse accurate operations, involving deterioration, categorization, and ranking [39]. In comparison to the regular gradient boosting, XGBoost employs its strategy of creating trees where the score of the similarity and growth choose the most suitable node breaks [40].

3.4. AdaBoost

Boosting algorithm is a famous approach in the machine learning domain to solve the complex problems. AdaBoost is the standard approach in the family of Boosting [41]. This approach has the authority of resisting overfitting. Comprehending the secrets of this sensation is a charming fundamental academic issue. Multiple investigations are dedicated to describing it through statistical theory and margin approach [42]. AdaBoost approach was the preferably suitable boosting algorithm and stayed one of the most widely employed and examined, with applications in multiple domains. Also, this approach can be utilized to facilitate the execution of any algorithm used in machine learning [43]. These are approaches that accomplish precision just beyond random event on a categorization issue. The most appropriate and hence common method employed with AdaBoost are decision trees along with level one [44].

3.5. Bagging Classifier

Bagging is a widely known ensemble building strategy, where an individual classifier in the ensemble is prepared on a separate bootstrap replicate of the training group [45]. The current outcome has demonstrated that bagging can decrease the effect of outliers in training data, particularly if the distant observations are resampled with a more inferior possibility [46]. It is also known as Bootstrap aggregating, which involves having individual models in the ensemble voice with similar significance. To facilitate sample variance, bagging trains every model in the ensemble employing a randomly marked subset of the training group [47]. As an instance, the random forest approach incorporates random decision trees along with bagging to acquire extremely elevated classification precision. Bagging attempts to execute parallel trainees on undersized sample inhabitants and then carries a norm of all the forecasts [48]. Bagging operates by integrating forecasts by voting, every model obtains equivalent significance “Idealized” interpretation: Model several training groups of size n and then create a classifier for each training group and connect the classifiers’ forecasts [49].

4. Cognitive Lightweight Logistic Regression Approach

Logistic regression approach is employed to estimate the statistical importance of individual separate variable with reference to possibility [50]. It is a strong form of modelling binomial effect. For instance: if the individual is stirring to suffer from cancer or not by carrying weights 0 as well as 1. Decision trees, as well as logistic regression, are extremely famous approaches in the machine learning domain to solve complex issues [51]. Instead of having so many advantages, decision trees tend to have issues handling linear associations among variables as well as logistic regression has problems with relations effects among variables [52, 53]. Therefore, logistic regression is lightweight and cognitive in nature. Due to lightweight behaviour, LR is easy to deploy on the UAV network. Figure 2 presents the flow chart of cognitive lightweight-LR approach. Equations (1) and (2) present the logic explanation of linear logistic regression [54].

Figure 2 is the detailed flow chart regarding logistic regression. Initially, training data are used to formulate and train each function. Cost function is used to be calculated for logistic regression to test overall data. While, testing binary classification is utilized either “0” and “1” means “presence of attack” or “absence of attack” is identified easily.

5. Simulation Results

The simulation environment is designed for IoT-based UAV networks in anaconda python. UNSW-NB 15 dataset is used which consists of various cyberattacks such as DoS/DDoS, backdoors, fuzzers, exploits shellcode, and worms. The mentioned dataset consists of more than two million records. UNSW-NB 15 is a hybrid dataset where advanced data network traffic is incorporated. Three major problems can be easily tackled using UNSW-NB 15 dataset like low footprint, data traffic scenarios, and training/testing methods. However, for light weight algorithms the mentioned dataset are giving better results. Binary classification is utilized while simulating machine learning techniques which include decision tree, random forest, XGBoost, AdaBoost, bagging, and logistic regression [5564]. Furthermore, the data are divided in training and testing modules which are as follows.

5.1. Data Training

Figure 3 provides detail information about training dataset. During training almost 56.06% data illustrates security attacks, while around 44.94% there is “no attack.” Moreover, training dataset is quite balanced due to that false alarm is reduced.

5.2. Data Testing

Figure 4 shows data regarding testing dataset where 31.94% portion is for “no attack” scenario. However, 68.06% data are giving information regarding attacks.

Figure 5 depicts the detail information about training and testing datasets. The metric of high accuracy is maintained in optimal way using UNSW-NB 15 dataset. In high accuracy, there are two scenarios which include attack or no attack. Furthermore, if there will be attack but in reality no attack will be detected which will be false positive. Similarly, true negative will be having no attack where no attack can be identified.

The overall results of machine learning classifiers are presented in Figure 6. Logistic regression performs well in comparison with other algorithms. LR detects security attacks for about 82.54%, while, random forest 71.59%, XG Boost 49.54%, DT 49.17%, Bagging 44.70%, and AdaBoost around 28.39%. Also, Figure 6 provides information about the results of various machine learning classifiers in the area of IoT enabled FANETs. Figure 7 shows the similar results of Figure 6.

5.3. Comparative Discussion of ML-Based IDS

Table 2 elaborates the detailed comparison regarding ML-based intrusion detection system. The approach of network-based intrusion detection system is widely utilized. Also, anomaly-based IDS is quite popular approach to detect cyberattacks. In anomaly-IDS technique, a novel threshold is needed to be designed for identification of security attacks. While, signature-based IDS must have the concept of some possible attacks features stored in database. Although hybrid-IDS is the combination of anomaly and signature but the use is quite less. Therefore, the proposed solution is providing better possibilities to detect cyberattacks. In addition, Table 2 shows the studies which use to have information about different types of intrusion detection system. Also, machine learning-based IDS are widely utilized in the previous study.

6. Conclusion

Machine learning-based techniques are deployed in IoT-based UAV networks. The main aim of this research study is to propose a novel concept of detecting abnormal behaviour using machine learning. Flying ad hoc networks is the combinations of group of UAVs formulate a network. FANET structure consists of UAVs, satellites, and ground-based stations. While, IoT sensor nodes are deployed on ground and UAVs use to collection information. However, cognitive lightweight-LR approach has reduced false alarm and balanced high accuracy in IoT-based UAV network. UNSW-NB 15 dataset is utilized to check the performance. Nowadays, security is one of the major concerns in almost every field of study. FANET-based IDS is the approach utilized to detect possible cyberattacks. The proposed approach has mimicked the overhead, and false data packets are detected easily. The simulation results shows that logistic regression performed better in comparison with other techniques. The concept of IoT-based UAV networks can be merged with smart cities in near future. In addition, optimization techniques and graph theory will give new directions to this study. Data traffic models and new datasets are the need of futuristic cities.

6.1. Future Direction

In near future, UAV network will be widely utilized for flying taxis in the concept of smart cities. Therefore, artificial intelligence, machine learning, deep learning, reinforcement-based learning, and federated learning can be utilized for intelligent IDS to detect cyberattacks. While in smart cities internet of everything will be used to advance communication. Routing protocols and communication standards need to be further investigated. Also, novel datasets need to be designed which will be helpful for researchers and scientists for further experimentations [7779].

Data Availability

All the data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.