research-article

Free Access

Just Accepted

A Natural Language Processing System for Text Classification Corpus Based on Machine Learning

Author:
Yawen Su

0009-0005-9387-013X
View Profile

ACM Transactions on Asian and Low-Resource Language Information ProcessingAccepted on January 2024https://doi.org/10.1145/3648361

Online AM:19 February 2024Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

A classification system for hazardous materials in air traffic control was investigated using the Human Factors Analysis and Classification System (HFACS) framework and natural language processing to prevent hazardous situations in air traffic control. Based on the development of the HFACS standard, an air traffic control hazard classification system will be created. The dangerous data of the aviation safety management system is selected by dead bodies, classified and marked in 5 levels. TFIDF TextRank text classification method based on key content extraction and text classification model based on CNN and BERT model were used in the experiment to solve the problem of small samples, many labels and random samples in hazardous environment of air pollution control. The results show that the total cost of model training time and classification accuracy is the highest when the keywords are around 8. As the number of points increases, the time spent in dimensioning decreases and affects accuracy. When the number of points reaches about 93, the time spent in determining the size increases, but the accuracy of the allocation remains close to 0.7, but the increase in the value of time leads to a decrease in the total cost. It has been proven that extracting key content can solve text classification problems for small companies and contribute to further research in the development of security systems.

References

Xavier, B. A., & Chen, P. H. (2022). Natural language processing for imaging protocol assignment: machine learning for multiclass classification of abdominal ct protocols using indication text data. Journal of Digital Imaging,58(7),69-74.Google Scholar
Cosimo, Ieracitano, A., Paviglianiti, M., Campolo, A., Hussain, E., & Pasero, F., et al. (2021). A novel automatic classification system based on hybrid unsupervised and supervised machine learning for electrospun nanofibers. IEEE/CAA Journal of Automatica Sinica, v.8(01), 68-80.Google Scholar
Guhan, B., Sowmiya, S., Snekhalatha, U., & Rajalakshmi, T. (2021). Automated segmentation of heel fissures based on thermal image processing and classification based on machine learning algorithms. Biomedical Engineering: Applications, Basis and Communications,36(7),96-102.Google Scholar
Hamid, Z., & Khafaji, H. K. (2021). A general algorithm of association rule-based machine learning dedicated for text classification. Journal of Physics Conference Series, 1773(1), 012011.Google ScholarCross Ref
Pilar López-beda a, Manuel Carlos Díaz-Galiano a, Teodoro Martín-Noguerol b, B, A. L., L. Alfonso Urea-López a, & M. Teresa Martín-Valdivia a. (2021). Automatic medical protocol classification using machine learning approaches. Computer Methods and Programs in Biomedicine, 200(9),15-16.Google Scholar
Faris, H., Habib, M., Faris, M., Alomari, A., Castillo, P. A., & Alomari, M. (2022). Classification of arabic healthcare questions based on word embeddings learned from massive consultations: a deep learning approach. Journal of ambient intelligence and humanized computing,85(4), 13.Google Scholar
Occhipinti, A., Rogers, L., & Angione, C. (2022). A pipeline and comparative study of 12 machine learning models for text classification. arXiv e-prints,123(7),56-59.Google Scholar
Odden, T. O. B., Marin, A., & Rudolph, J. L. (2021). How has science education changed over the last 100 years? an analysis using natural language processing. Science Education,854(6),65-68.Google Scholar
Rajkumar, N., Subashini, T. S., Rajan, K., & Ramalingam, V. (2021). An efficient feature extraction with bidirectional long short term memory based deep learning model for tamil document classification. Journal of computational and theoretical nanoscience,874(3), 18.Google Scholar
Song, G. (2021). Sentiment analysis of japanese text and vocabulary learning based on natural language processing and svm. Journal of Ambient Intelligence and Humanized Computing,45(5),75-78.Google Scholar
Faris, H., Habib, M., Faris, M., Alomari, A., & Alomari, M. (2021). Classification of arabic healthcare questions based on word embeddings learned from massive consultations: a deep learning approach. Journal of Ambient Intelligence and Humanized Computing,65(2),35-39.Google Scholar
Gasmi, K. (2022). Medical text classification based on an optimized machine learning and external semantic resource. Journal of circuits, systems and computers,847(52),125-129.Google Scholar
Guberney Muetón-Santa, Escobar-Grisales, D., Felipe Orlando López-Pabón, Paula Andrea Pérez-Toro, & Orozco-Arroyave, J. R. (2022). Classification of poverty condition using natural language processing. Social Indicators Research, 162(3), 1413-1435.Google ScholarCross Ref
El Mir, I., El Kafhali, S., & Haqiq, A. (2022). A hybrid learning approach fortext classification using natural language processing,85(7),55-58.Google Scholar
Cherif, W., Madani, A., & Kissi, M. (2021). Text categorization based on a new classification by thresholds. Progress in Artificial Intelligence, 452(7),1-15.Google Scholar
Penfold, R. B., Carrell, D. S., Cronkite, D. J., Pabiniak, C., Dodd, T., & Glass, A. M., et al. (2022). Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening. BMC Medical Informatics and Decision Making, 22(1), 1-13.Google ScholarCross Ref
Alexis, A., Kyubum, L., Qingyu, C., Ling, L., & Zhiyong, L. Litsuggest: a web-based system for literature recommendation and curation using machine learning. Nucleic Acids Research(W1),96(74),88-92.Google Scholar
Hagberg, E., Hagerman, D., Johansson, R., Hosseini, N., Liu, J., & Bjrnsson, E., et al. (2022). Semi-supervised learning with natural language processing for right ventricle classification in echocardiography—a scalable approach. Computers in Biology and Medicine, 143(4), 105282.Google ScholarDigital Library
Mariyam, A., Basha, S. A. H., & Raju, S. V. (2021). A literature survey on recurrent attention learning for text classification. IOP Conference Series: Materials Science and Engineering, 1042(1), 012030 (4pp).Google ScholarCross Ref
Iqbal, S., Hassan, S. U., Aljohani, N. R., Alelyani, S., Nawaz, R., & Bornmann, L. (2021). A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies. Scientometrics, 126(3),666-668.Google Scholar

Index Terms

A Natural Language Processing System for Text Classification Corpus Based on Machine Learning
1. Applied computing
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Index terms have been assigned to the content through auto-classification.

Recommendations

TextCNN-based ensemble learning model for Japanese Text Multi-classification
Abstract
In this paper, we aim at improving Japanese text classification using TextCNN-based ensemble learning model. Specifically, we first construct three different sub-classifiers, combining ALBERT, RoBERTa, DistilBERT with TextCNN, respectively; and ...
Graphical abstract

Display Omitted
Highlights
- Three TextCNN-based sub-classifiers for Japanese text classification are designed.
- A Bagging ensemble learning model is proposed to combine three different subclassifiers for multi-label Japanese text classification.
- A Japanese ...
Read More
Fundamental Sentiment Analysis by Natural Language Processing and Machine Learning for Email Classification
APIT '23: Proceedings of the 2023 5th Asia Pacific Information Technology Conference

Due to its ease of use, speed, adaptability, and ability to keep a complete record of correspondence, email is a commonly used and trusted communication medium. The vulnerability of these emails to cyberattacks has increased. This study utilized the ...
Read More
Combining Homogeneous Classifiers for Centroid-based Text Classification
ISCC '02: Proceedings of the Seventh International Symposium on Computers and Communications (ISCC'02)

Centroid-based text classification is one of the most popular supervised approaches to classify texts into a set of pre-defined classes. Based on the vector-space model, the performance of this classification particularly depends on the way to weight ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian and Low-Resource Language Information Processing Just Accepted
ISSN:2375-4699
EISSN:2375-4702
Table of Contents

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Online AM: 19 February 2024
- Accepted: 31 January 2024
- Revised: 24 December 2023
- Received: 30 October 2023
Published in tallip Just Accepted

Check for updates
Author Tags
Safety social engineering
Air traffic control system
Hazard sources
HFACS model
TFIDF TextRank method
SVM optimization
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 130
  Total Downloads
- Downloads (Last 12 months)130
- Downloads (Last 6 weeks)35
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Natural Language Processing System for Text Classification Corpus Based on Machine Learning

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

TextCNN-based ensemble learning model for Japanese Text Multi-classification

Fundamental Sentiment Analysis by Natural Language Processing and Machine Learning for Email Classification

Combining Homogeneous Classifiers for Centroid-based Text Classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Natural Language Processing System for Text Classification Corpus Based on Machine Learning

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

TextCNN-based ensemble learning model for Japanese Text Multi-classification

Fundamental Sentiment Analysis by Natural Language Processing and Machine Learning for Email Classification

Combining Homogeneous Classifiers for Centroid-based Text Classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media