Abstract

With the increasing popularization and development of WiFi devices, nowadays WiFi-based indoor localization has become a hot topic. Traditional Wi-Fi-based localization technologies which utilize received signal strength indication suffer from indoor multi-path effects and result in localization performance degradation. Therefore, choosing the appropriate characteristic of the WiFi signal is crucial for indoor localization. To improve the localization accuracy, we propose PLAP, a passive localization method using amplitude and phase of channel state information (CSI). Specifically, Hampel filter is used to process the amplitude signals and linear transformation is employed for calibrating phases. To extract representative features from calibrated amplitude and phase signals, we developed a deep learning framework which combines a convolutional neural network (CNN) and a bi-directional Gated recurrent unit (BGRU) to estimate the location of an objective. The experimental results show that the proposed PLAP outperforms other baselines with real-world evaluation.

1. Introduction

Nowadays, indoor localization [1, 2] has been widely used in many applications, like searching and rescuing people alive in the earthquake [3], detection for mining safety, battlefield military applications, patient monitoring, and intrusion detection [4, 5], and so on [68]. A wearable sensor based localization technique has been proposed and has shown decent performance for indoor localization. However, it is not convenient to equip objectives with sensors in certain scenarios. In contrast, device-free passive localization methods can detect objectives without attaching any device to them. Although camera-based localization has shown its success in high localization accuracy, it can only work well under line-of-sight (LOS) environments. At the same time, several indoor localization technologies based on different devices have been proposed, like Ultrawide Band, Infrared Ray, and so on. However, Ultrawide Band requires to equip expensive equipment. Infrared Ray suffers from short transmission distance, and a large number of sensors need to be deployed, resulting in high hardware cost requirements [9].

As we all know, WiFi networks [10] have been ubiquitous indoors. By catching the differences of wireless links under the influence of targets, the researchers have realized the passive location recognition of moving targets. WiFi-based indoor localization can deal with the disadvantages of the aforementioned methods with commercial off-the-shelf devices. The basic theory behind WiFi-based localization is that the movements of objects will introduce reflection and refraction of wireless signals during transmission [11]. By establishing the nonlinear mapping relationship between the coordinates of locations and fingerprint signals, we can predict the target positions. Nowadays, WiFi network interface cards (NICs) make the extraction of CSI convenient in practical applications [12]. CSI is robust when faced with the changes of temperature, brightness, noise, and so on. Thus, utilizing CSI to realize behavior recognition and indoor localization has gained more attention from researchers.

Although CSI has been successfully used on various occasions, in most cases only the amplitude signals of channel frequency response are considered, and insufficient attention is paid to phase information. However, phase measurements also involve important information of reflected signals. Under nonline-of-sight (NLOS) transmission, the amplitude measurements will show higher randomness due to typically involving richer reflection, diffraction, and refraction effects [13]. On the contrary, phase measurements change periodically over propagation distances, which is more robust. Therefore, amplitude signals cannot fully reflect the infected CSI changes of the target. The phase information of the wireless link also includes useful information in the channel response. Thus, using both phase information and amplitude information for positioning can make the information richer and more robust. However, it is infeasible to directly use the raw phase information for localization since the raw phase information contains many different random errors and changes periodically over propagation distances. To deal with these issues, we propose PLAP, an indoor passive localization method by considering both amplitude and phase information. To be specific, the proposed PLAP employed the linear transformation to calibrate the phase measurements for localization. Moreover, in order to learn informative features from the amplitude and calibrated phase measurements, we developed a deep learning framework that combines convolutional neural networks (CNN) and bidirectional Gated recurrent unit (BGRU) to consider both spatial and temporal correlation over the CSI measurements. Firstly, CNN is employed to extract spatial correlations over the amplitude and phase measurements. Secondly, current LSTM-based models only consider the past measurements for prediction, while the future measurements also include important information for localization. Therefore, we used BGRU to learn representative temporal correlation features in two directions from the sequential amplitude and phase measurements by considering both past and future amplitude and phase information.

The main contributions of this paper are summarized as follows:(1)To make full use of CSI data and realize robust localization, we employed both phase and amplitude information for localization. Besides; for amplitude, we use the Hampel filter to filter the recorded amplitude information to remove outliers; while the phase information is unwound by a linear transformation, and the error linear function of band offset is constructed to obtain the calibrated phase information.(2)We proposed a deep learning model, CNN-BGRU, to learn both spatial and temporal representative features from amplitude and phase information. The proposed CNN-BGRU model consists of four modules: CSI data integration and reconstruction module, spatial feature learning module using CNN, time-series feature learning module using BGRU, and output prediction module for location estimation based on amplitude and phase information.(3)We implement the PLAP system with a desktop and a router (TP-LINK WDR6500). We verify the feasibility of the PLAP by carrying out extensive experiments in three different typical indoor environments. By revising the driver of 5300 NIC, we obtained the original amplitude and phase information.

The rest of this paper is organized as follows: The related studies are provided in Section 2. Section 3 designs and establishes the localization system; Section 4 describes the data preparation; Section 5 details the proposed CNN-BGRU localization model, and Section 6 describes experiments and analysis. At last, in Section 7, we summarize the work.

WiFi-based indoor localization is mainly divided into model-based and fingerprint-based [14]. The model-based methods utilize geometrical methods to measure the distances of several known access points (APs), while the fingerprint-based ones utilize the received signal that has pattern differences in different positions for indoor localization. Model-based localization methods including the centroid determination method [15], AOA, and TOA [16].

Compared to model-based approaches, the methods based on fingerprint roundly reflect signal propagation in both LOS and NLOS paths. These approaches consider the fact that the propagation of multi-path at each position is unique. Because it is readily implemented with hardware, RSS is widely used in the previous fingerprinting positioning system. Radar [17] is the 1-st fingerprinting system, which adopts a deterministic algorithm based on RSS. Horus [18] achieves better localization accuracy than Radar, which also utilizes RSS. Unfortunately, the RSS values change significantly over time because of the multipath fading and shadows effect. The error of fingerprint localization based on RSS is up to 10 dB, which leads to lower accuracy [19]. Therefore, RSS-based fingerprints suffer from poor positioning performance [20].

Different from RSS, CSI could provide richer information of subcarriers, which can be helpful for indoor localization. For the past few years, a lot of studies on CSI-based indoor localization have been presented. Jin’s team [21] utilizes approximated channel impulse response amplitudes vectors as the fingerprints. PinLoc scheme of utilizing the CSI amplitude signals to carry on indoor localization experiments is feasible in different scenes [22]. Zhou’s team raised an equipment-independent algorithm using CSI, which uses SVM to convert the localization problem to a regression task [23]. This method establishes a nonlinear mapping relation between the CSI fingerprints and the target’s locations, which can estimate the target position according to the corresponding CSI fingerprint. In [24], a localization method based on the mixture of CSI and RSS is raised, which has better location accuracy than the location method using CSI or RSS alone. Later, Wang et al. designed an autoencoder network, automatically learn differentiated features from wireless signals, and then fused these features into a machine learning framework based on softmax function to achieve localization and gesture recognition [25].

In the FIFS scheme, the CSI value is optimized by the weight of multiple antennas to improve the positioning accuracy [12]. Deep-Fi [26] is a scheme for indoor localization using deep learning, it can exploit the characteristics of CSI amplitude measurements in all antennas. Phase-Fi [27] proposed a deep learning indoor location method to calibrate the phase information of CSI. This method extracts phase information from multiple antennas and multiple subcarriers from NIC, which could extract useful phase information from different channels. However, the data of CSI is easily affected by environmental changes, so how to achieve localization with high precision and robustness is still a challenge.

3. Preliminaries and the Proposed System Architecture

In this section, the introduction of Channel State Information and the framework of the proposed PLAP are presented.

3.1. Channel State Information

MIMO-OFDM is widely used for wireless communication systems to mitigate the effects of the multipath effect. At present, it is simple to acquire fine-grained physical layer information of the transmitter and receiver through wireless network cards which support IEEE 802.11n standards. The signals of CSI can provide both amplitude and phase information of multicarriers.

In the paper, we use 2.4 GHz band WiFi signals. We establish the channel model of the OFDM system below the mode of 20 MHz, it can be expressed as follows:where is the Gaussian-noise, and denote received and transmitted signals matrix, and represents the CFR. denotes the -th subcarrier’s CFRwhere and are the CRFs of amplitude and phase with the -th subcarrier. In the proposed positioning system, we extract 30 subcarriers from the OFDM system, which have the measurements of the amplitude and phase. Nevertheless, the original phase changes periodically and is difficult to meet the requirements of indoor localization. Therefore, we use the calibrated phase together with preprocessed amplitude to form the fingerprint of the target location.

3.2. System Architecture

The architecture of the proposed PLAP localization system is presented in Figure 1. As shown in Figure 1, without attaching any device to the objectives, the proposed system only utilizes a desktop computer acted as the receiver and a TL-WDR6500 as the transmitter. The whole system is divided into two parts: the offline stage and the online stage.

In the offline stage, the amplitude and phase information is collected and the radio map is built. The amplitude and phase of CSI information were extracted and preprocessed by Hample filtering and linear transformation, respectively. After the calibration of amplitude and phase, we employ the fusion of the them as a new “fingerprint” for passive indoor positioning. Next, a CNN-BGRU model is trained to estimate the location of objects and calibrated amplitude and phase based on the training dataset. Then, we will give the details of each part in the following sections.

4. Data Collection and Sanitization

Assume that there are reference positions in the experimental environment. The radio map consists of the CSI amplitude and phase information and the corresponding coordinate of each training point. . and are the amplitude information and phase measuring of the points, respectively. Besides,  = 0, 1, 2 are the indexing number of each antenna. The calibration of amplitude and phase are conducted as follows.

4.1. Amplitude Denoising

CSI amplitudes reflect CFRs with multipath effects and fades of channels. To better reflect the real amplitude characteristics and reduce the influence of the noise inherent in the dynamic environment and equipment, as well as accelerate the fitting effect of the neural network, it is necessary to conduct the preprocessing operation of the CSI acquisition signal. Figure 2 shows amplitude signals from all antennas in one environment.

Firstly, we can see from Figure 2 that the attenuation of the amplitude generated by different paths is different. Therefore, the amplitude on a single antenna does not adequately reflect the location characteristics. Thus, this paper utilizes three antennas including 90 subcarriers, which could vastly increase the discrimination of each location point, resulting in higher localization accuracy. Besides, it can be seen from the figure that there are outliers in amplitude measurements, as a result, the raw amplitude measurements is unable to estimate the target position information effectively. Thus, it is necessary to filter and eliminate the outliers.

To detect and remove outliers, we preprocess the amplitude with the four most widely used filtering algorithms, i.e., Hampel filter, Butterworth filter, discrete wavelet transform (DWT), and low-pass filter. Figure 3 shows the raw amplitude information of 1000 packets of amplitude. Figures 47 describe the filtered amplitude data using the Hampel filter, Butterworth filter, DWT, and low-pass filter, respectively. As shown in these figures, the preprocessed amplitude data is relatively stable with the Hampel filtering algorithm, demonstrating that the Hampel filtering algorithm has a better effect on eliminating the environmental noise which affects the positioning accuracy. Therefore, we choose Hampel as the preprocessing algorithm for amplitude measurements. Hampel identifier algorithm works on a sliding window, which chooses some values in as outliers, where and denote midvalue and midabsolute deviation of observed values. denotes multiples of the standard deviation [28]. The absolute value of the median for all elements was used to estimate the standard deviation of the median of each sample pair. If a sample differs from the median by more than three standard deviations, the sample is replaced by the median. More details of the Hampel filter could be found in [29]. In our case, according to the experimental analysis, the observation window’s size is chosen as 100 and as 3.

We pre-processed amplitude information with Hampel filter then produce denoised amplitude information , which effectively describes the characteristics for each location and then updates the fingerprint database .

4.2. Phase Sanitization

Although CSI phase information can be easily obtained from the Intel 5300 network card, the raw phase information can not be used directly for indoor localization. With both carrier and sampling frequency offset, useful information are easily covered in phase measurements. Figure 8 shows that the individual subcarriers of measured phases are shifted during the acquisition process. Thus, firstly we need to make the phase information useful for localization. To this end, we present an effective approach of linear transformation to correct phases and reduce random phase shifts. Let represent a measured phase of the -th sub-carrier.where is the real phase, is the sub-carrier index of the -th sub-carrier (ranging from −28 to 28), denotes the timing offset, denotes phase offset, and denotes measurement noise. is the size of the FFT in IEEE 802.11n [30].

The proposed linear transformation can be separated into two periods. The first period shows the linear straightening, in which the phases are calculated by unwinding the raw phases. In the second period, the unwinding phase values are subtracted from the defined linear error function to obtain the calibrated phase. Figure 9 plots the phase values after CSI unwinding of the three antennas of the receiver. It can be seen that the response of various antenna channel frequencies has a great difference, and with the increase of the number index, the phases of different antennas all gradually decrease. It is shown that unwinding calculation can eliminate the periodicity of the raw phase and enhance the discrimination degree of the phase data.

Removing the time deviation and phase deviation is crucial for phase correction. First, a rake ratio of and the deviation of are defined as follows:

Since the subcarrier frequency is symmetric in IEEE 802.11n standard, the sum of the index numbers of the subcarriers is 0, so . Thus, we can present as . To obtain the calibrated phase, the raw phase is subtracted from the linear part, which is expressed as (where small measurement noise Z is ignored). The detailed processing of phase calibration is shown in Table 1. Figure 10 shows the calibrated phases in different antennas. The positioning features of the localization target also have significant differentiation in each antenna. In addition, Figure 11 shows the measured (denoted by ) and the phase values (denoted by ) about polar coordinates for 150 packages of CSI for the 8-th sub-carrier. Raw phase value scatters randomly between 0° and 360°. Due to its randomness, the raw phase can’t be used directly for indoor positioning. The calibration phases are specific linear conversion of the phase of each subcarrier on the same antenna, effectively preventing the phase jump and making it concentrated in a certain small sector (330°-0°). Therefore, the phase shift can be eliminated by a linear transformation, and the calibrated phase is helpful to improve the indoor localization accuracy.

By using a linear transformation algorithm, we can calibrate phase information and generate calibrated phase information . can effectively reflect the characteristics of different locations under the NLOS path, which leads to a better understanding of complex radio propagation environment and then update the fingerprint database . A new two-dimensional array consisting of and is used as the input layer of the proposed localization model CNN-BGRU. 90 subcarriers can be gathered by each antenna of the 5300 cards, so the data’s dimension is . represents the number of reference positions, represents the number of packets collected.

5. Proposed CNN-BGRU Positioning Model

The main idea of the positioning framework based on CNN-BGRU is to extract and learn multidimensional features from the relevant characteristics of the CSI data to form a compound network model. Its positioning framework is shown in Figure 12. The whole is composed of three parts, namely, the spatial feature learning module, time-series feature learning module, and output prediction module.

5.1. Spatial Feature Learning Module

Since CNN can show good advantages in the learning of spatial features, it has the advantages of weight sharing network structure, reducing the complexity of the network model and the number of weights. We employed CNN as the feature extractor to learn the spatial features in each time domain of amplitude and phase measurements. The architecture of the used CNN is shown in Table 2. The learned spatial features of CNN are fed into the BGRU for further temporal feature learning.

5.2. Temporal Feature Learning Module

BGRU is used to learn the time-related features of CSI and the measurements at different time steps , , , . Owing to the sequential modeling capability, long short-term memory (LSTM) has been successfully applied to CSI-based sensing. However, conventional LSTM suffers from the problem of vanishing with long-term dependencies. To solve this problem, Cho et al. [31] developed GRU, which is a slight variation of LSTM. Compared with vanilla LSTM, GPU made two major changes: first, it combines the forget and input gates into a single “update gate;” second, it merges the cell state and hidden state. The resulting model is simpler than the standard LSTM model. To give a clear illustration, a single cell of GRU is shown in Figure 13.

As shown in Figure 13, GRU consists of two gates, i.e., a reset gate and an update gate . The reset gate determines how to combine the current input in the state with the historic memory. The update gate is responsible for deciding the degree of historic memory which should be maintained in the node. The reset gate and update gate are computed by the following equation:

The hidden state of GRU at time can be given based on the previous hidden state and the candidate hidden state as follows:

The GRU network is simple yet effective and can be regarded as a light version of LSTMs in terms of computation cost and complexity.

Nevertheless, the conventional GRU network only works in one direction for processing a finite sequence, which means the current hidden state is generated only by considering the past information of the sequential data. To incorporate the information both in the past and future, we employed a Bidirectional Gated Recurrent Uni (BGRU) network to generate hidden states in both directions. Different from GRU, the BGRU network consists of two parallel layers propagating in two directions, i.e., a forward layer and a backward layer, which are shown in Figure 14.

In BGRU, the hidden state of time step is defined as the concatenation of the states of the two directions:where and denote the output vectors of the forward and backward layers respectively, and and represent the forward and backward processes respectively. The hidden state for the BGRU is a concatenation of and .

5.3. Location Prediction Module

It is mainly composed of a maximum pooling layer and a fully connected layer. The extracted spatial and temporal features were fed into the location prediction module to estimate the target position. The MSE loss was employed to train the CNN-BGRU model.

6. Experiments and Analysis

6.1. Experimental Setting

In our experiments, a TP-Link WDR6500 WiFi commercial router was installed as the transmitter and placed on a workbench with a height of 1.2 meters in the experimental environment. The receiver is an HP-800G4 I7 computer. We modified the wireless driver of the computer with CSI tools and install 12.04LTS Ubuntu Linux on it. The commercial equipment in this paper uses 20 MHz bandwidth for CSI data acquisition with a 2.4 GHz frequency band. The Intel 5300 network card is equipped with three 8 dB antennas. The antennas are fixed on a metal tripod with a height of 1.2 meter. We set the sample rate as 20 times per second. There are 2000 packages collected at each reference point, and the distance between each point is 0.6 m.

Our experimental environments were located at a building of a University, including three typical indoor scenarios, i.e., two indoor laboratory areas (11 m  7 m, 8 m  5.6 m), and one corridor area (8 m  2 m). Figure 15 plots the floor plan of laboratory A with 45 training points and 12 testing points. Figure 16 shows the layout of laboratory B with 24 training points and 7 testing points. In addition, Figure 17 plots the layout of the corridor with 27 training points and 7 testing points.

6.2. Ablation Study

To evaluate the contribution of each module (CNN and GRU), we conducted ablative studies on the data set collected in three environments. Specifically, we verify the effectiveness of each with the following model:(i)CNN [32]: in this case, we only employed the CNN to extract features(ii)GRU [33]: in this case, we only used the GRU as the feature extraction model

Tables 35 show the experimental results. From Tables 35, we can draw the following conclusions based on the ablation experiments: using only CNN or GRU network for predicted localization, the RMSE is higher than our proposed model, which indicates that our proposed model can extract both spatial and temporal representatives from CSI measurements. Overall, the results show that the proposed method can effectively reduce the localization error.

6.3. Localization Performance

To evaluate the positioning performance of the proposed PLAP system, The root mean square error (RMSE) is used to assess the experimental results.where represents the estimated error for all test points, denotes all number of testing points, , denote predicted positions and , denote real positions.

We compared PLAP with six state-of-art models, including Random Forest (RF) [34], Extreme Learning Machine (ELM) [35], K-Nearest Neighbor (KNN) [36], and Deep Neural Network (DNN) [37]. During the valuation, the same dataset is used for different algorithms. Tables 68 present the performance of all algorithms under three different experimental scenarios.

Table 6 presents that the RMSE of PLAP is 1.55 meters in laboratory A, and the proposed PLAP outperforms other methods in terms of localization accuracy. Table 7 presents our system achieves an average of errors of about 1 meter in laboratory B, which is better than other algorithms. For the corridor, the localization results are shown in Table 8, the mean localization error of PLAP is 1.053 meters, and the standard deviation is also the lowest, which achieves the best performance compared with other algorithms. Please note that although the execution time of our proposed PLAP is not the best, while the execution time is computed with all testing samples, thus, it can satisfy the real-world application.

Figure 18 plots the cumulative distribution function (CDF) for all algorithms in Lab A. As shown in Figure 18, our PLAP achieves more than 80% localization error under 2.3 meters, while the other schemes have a larger localization error.

Figures 19 and 20 plot the CDF with all schemes in the lab B and the corridor, respectively. In Figure 19, our system achieves more than 80% localization errors under 1.6 m, while the other schemes such as DNN and RF are more than 1.9 meters. Similarly, Figure 20 also demonstrates the superior of our proposed PLAP when compared with other baselines.

Figures 2123 show the loss changes of the three environmental during the training process. It can be seen that the loss of the proposed model stably converges to the minimal loss value with a decent decrease rate.

7. Conclusion

We propose PLAP, an indoor positioning method using CNN-BGRU to improve the localization performance of CSI-based localization. Hampel filter is used to process raw amplitude signals and linear transformation is used to calibrate the phases. To extract informative features for different locations, a deep network framework CNN-BGRU is designed to learn the discriminative features from calibrated amplitude and phase information. The experimental results show that the proposed PLAP could achieve better localization performance than other baselines.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by National Natural Science Foundation of China (Grant nos. 51974179 and 52004150), Natural Science Foundation of Shandong Province (Grant nos. ZR2019MEE118 and ZR2019BEE067), and Qingdao Science and Technology Plan Project (Grant no. 19-3-2-6-zhc).