Skip to main content
Log in

Machine Learning Techniques for Anomaly Detection in High-Frequency Time Series of Wind Speed and Greenhouse Gas Concentration Measurements

  • MACHINE LEARNING IN NATURAL SCIENCES
  • Published:
Moscow University Physics Bulletin Aims and scope

Abstract

Fluxes of greenhouse gases (GHG) may be assessed in situ using the eddy covariance method through processing high-frequency measurements of gas concentration and wind speed acquired at certain sites, e.g., carbon measurement test areas of the pilot project of the Ministry of Education and Science of Russia. The measurements commonly come with noise, anomalies, and gaps of various natures. These anomalies result in biased GHG flux estimates. There are a number of empirical and heuristic approaches for filtering noise and anomalies, as well as for gap-filling. These approaches are characterized by many tuning parameters that are commonly adjusted by an expert, which is a limiting factor for large-scale deployment of GHG monitoring stations. In this study, we propose an alternative approach for anomaly detection in high-frequency measurements of GHG concentration and wind speed. Our approach is based on machine learning techniques. This approach is characterized by a lower number of tuning parameters. The goal of our study is to develop a fully automated data preprocessing routine based on machine learning algorithms. We collected the dataset of high-frequency GHG concentration and wind speed measurements from one of the carbon measurement test areas. In order to compare anomaly detection algorithms, we labeled anomalies in a subset of this dataset. We present two approaches for anomaly detection, namely: (a) identification of outliers based on the error magnitude in time series statistical forecasts performed by a machine learning (ML) algorithm; and (b) classification of anomalies using an ML model trained on the labeled dataset of outliers we mentioned above. We compared the approaches and algorithms based on the F1-score metric assessed with respect to an expert-labeled subset of anomalies in GHG concentration and wind speed time series. Within the forecast-error based approach, we trained several ML models: the ARIMA autoregression method, the CatBoost model for autoregression, the CatBoost model for forecasting employing additional features, and the LSTM artificial neural network. Within the supervised classification approach, we tested the CatBoost classification model. We demonstrate that ML models for forecasting deliver a high quality of time series prediction within the autoregression approach. We also show that the anomaly identification method based on the autoregression approach delivers the best quality with the F1-score reaching \(0.812\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

REFERENCES

  1. G. Burba, J. Kurbatova, O. Kuricheva, et al., Handbook for the Method of Turbulent Pulsations (Inst. Probl. Ekol. i Evolyutsii im. A.N. Severtsova, Ross. Akad. Nauk, Moscow, 2016).

  2. G. Fratini and M. Mauder, Atmos. Meas. Tech. 7, 2273 (2014). https://doi.org/10.5194/amt-7-2273-2014

    Article  Google Scholar 

  3. M. Tkachenko, M. Malyuk, A. Holmanyuk, and N. Liubimov, Label Studio: Data labeling software, (2020–2022). https://github.com/heartexlabs/label-studio.

  4. C. C. Aggarwal, Outlier Analysis (Springer, Cham, 2017). https://doi.org/10.1007/978-3-319-47578-3

    Book  Google Scholar 

  5. C. Agiakloglou and P. Newbold, J. Time Ser. Anal. 13, 471 (1992). https://doi.org/10.1111/j.1467-9892.1992.tb00121.x

    Article  MathSciNet  Google Scholar 

  6. A. V. Dorogush, V. Ershov, A. Gulin, et al., arXiv Preprint (2017). https://doi.org/10.48550/arXiv.1706.09516

  7. T. Akiba, S. Sano, T. Yanase, et al., ‘‘Optuna: A next-generation hyperparameter optimization framework,’’ in Proc. 25th ACM SIGKDD Int. Conf. on Knowledge Discovery & Data Mining, Anchorage, Alaska, 2019 (Association for Computing Machinery, New York, 2019), pp. 2623–2631. https://doi.org/10.1145/3292500.3330701

  8. S. Hochreiter and J. Schmidhuber, Neural Comput. 9, 1735 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  9. A. Paszke, S. Gross, F. Massa, et al., ‘‘PyTorch: An imperative style, high-performance deep learning library,’’ in Advances in Neural Information Processing Systems (Curran Associates, 2019), Vol. 32, pp. 8024–8035.

  10. D. P. Kingma and J. Ba, arXiv Preprint (2014). https://doi.org/10.48550/arXiv.1412.6980

  11. D. Passos and P. Mishra, Chemom. Intell. Lab. Syst. 223, 104520 (2022). https://doi.org/10.1016/j.chemolab.2022.104520

  12. C. Goutte and E. Gaussier, in Advances in Information Retrieval. ECIR 2005, Ed. by D. E. Losada and J. M. Fernández-Luna, Lecture Notes in Computer Science, Vol. 3408 (Springer, Berlin, 2005), pp. 345–359. https://doi.org/10.1007/978-3-540-31865-1_25

    Book  Google Scholar 

Download references

Funding

The methods for unsupervised anomaly detection in time series has been developed with support from the program FMWE-2022-0002. The methods for supervised anomaly detection in time series has been supported by the Kazan Federal University Strategic Academic Leadership Program (PRIORITY-2030).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. J. Kasatkin.

Ethics declarations

The authors of this work declare that they have no conflicts of interest.

Additional information

Publisher’s Note.

Allerton Press remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kasatkin, A.J., Krinitskiy, M.A. Machine Learning Techniques for Anomaly Detection in High-Frequency Time Series of Wind Speed and Greenhouse Gas Concentration Measurements. Moscow Univ. Phys. 78 (Suppl 1), S138–S148 (2023). https://doi.org/10.3103/S0027134923070135

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0027134923070135

Keywords:

Navigation