The Influence of Data Length on the Performance of Artificial Intelligence Models in Predicting Air Pollution,Advances in Meteorology

当前位置： X-MOL 学术 › Adv. Meteorol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The Influence of Data Length on the Performance of Artificial Intelligence Models in Predicting Air Pollution
Advances in Meteorology ( IF 2.9 ) Pub Date : 2022-9-30 , DOI: 10.1155/2022/5346647
Mohamed Khalid AlOmar ₁ , Faidhalrahman Khaleel ₁ , Abdulwahab Abdulrazaaq AlSaadi ₂ , Mohammed Majeed Hameed _{1,

3} , Mohammed Abdulhakim AlSaadi ₄ , Nadhir Al-Ansari ₅

Affiliation

Air pollution is one of humanity's most critical environmental issues and is considered contentious in several countries worldwide. As a result, accurate prediction is critical in human health management and government decision-making for environmental management. In this study, three artificial intelligence (AI) approaches, namely group method of data handling neural network (GMDHNN), extreme learning machine (ELM), and gradient boosting regression (GBR) tree, are used to predict the hourly concentration of PM_2.5 over a Dorset station located in Canada. The investigation has been performed to quantify the effect of data length on the AI modeling performance. Accordingly, nine different ratios (50/50, 55/45, 60/40, 65/35, 70/30, 75/25, 80/20, 85/15, and 90/10) are employed to split the data into training and testing datasets for assessing the performance of applied models. The results showed that the data division significantly impacted the model's capacity, and the 60/40 ratio was found more suitable for developing predictive models. Furthermore, the results showed that the ELM model provides more precise predictions of PM_2.5 concentrations than the other models. Also, a vital feature of the ELM model is its ability to adapt to the potential changes in training and testing data ratio. To summarize, the results reported in this study demonstrated an efficient method for selecting the optimal dataset ratios and the best AI model to predict properly which would be helpful in the design of an accurate model for solving different environmental issues.

中文翻译：

数据长度对人工智能模型预测空气污染性能的影响

空气污染是人类最严重的环境问题之一，在全球多个国家被认为是有争议的。因此，准确的预测对于人类健康管理和政府环境管理决策至关重要。本研究采用数据处理神经网络组法（GMDHNN）、极限学习机（ELM）和梯度提升回归（GBR）树三种人工智能（AI）方法来预测PM _{2.5的小时浓度。}在位于加拿大的多塞特车站上空。已进行调查以量化数据长度对 AI 建模性能的影响。因此，采用九种不同的比率（50/50、55/45、60/40、65/35、70/30、75/25、80/20、85/15 和 90/10）将数据拆分为用于评估应用模型性能的训练和测试数据集。结果表明，数据划分显着影响了模型的容量，发现60/40的比例更适合开发预测模型。此外，结果表明 ELM 模型提供了更精确的 PM _2.5预测浓度高于其他型号。此外，ELM 模型的一个重要特征是它能够适应训练和测试数据比率的潜在变化。总而言之，本研究报告的结果证明了一种选择最佳数据集比率和正确预测的最佳 AI 模型的有效方法，这将有助于设计解决不同环境问题的准确模型。

更新日期：2022-09-30

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>