skip to main content
research-article
Free Access
Just Accepted

A Natural Language Processing System for Text Classification Corpus Based on Machine Learning

Online AM:19 February 2024Publication History
Skip Abstract Section

Abstract

A classification system for hazardous materials in air traffic control was investigated using the Human Factors Analysis and Classification System (HFACS) framework and natural language processing to prevent hazardous situations in air traffic control. Based on the development of the HFACS standard, an air traffic control hazard classification system will be created. The dangerous data of the aviation safety management system is selected by dead bodies, classified and marked in 5 levels. TFIDF TextRank text classification method based on key content extraction and text classification model based on CNN and BERT model were used in the experiment to solve the problem of small samples, many labels and random samples in hazardous environment of air pollution control. The results show that the total cost of model training time and classification accuracy is the highest when the keywords are around 8. As the number of points increases, the time spent in dimensioning decreases and affects accuracy. When the number of points reaches about 93, the time spent in determining the size increases, but the accuracy of the allocation remains close to 0.7, but the increase in the value of time leads to a decrease in the total cost. It has been proven that extracting key content can solve text classification problems for small companies and contribute to further research in the development of security systems.

References

  1. Xavier, B. A., & Chen, P. H. (2022). Natural language processing for imaging protocol assignment: machine learning for multiclass classification of abdominal ct protocols using indication text data. Journal of Digital Imaging,58(7),69-74.Google ScholarGoogle Scholar
  2. Cosimo, Ieracitano, A., Paviglianiti, M., Campolo, A., Hussain, E., & Pasero, F., et al. (2021). A novel automatic classification system based on hybrid unsupervised and supervised machine learning for electrospun nanofibers. IEEE/CAA Journal of Automatica Sinica, v.8(01), 68-80.Google ScholarGoogle Scholar
  3. Guhan, B., Sowmiya, S., Snekhalatha, U., & Rajalakshmi, T. (2021). Automated segmentation of heel fissures based on thermal image processing and classification based on machine learning algorithms. Biomedical Engineering: Applications, Basis and Communications,36(7),96-102.Google ScholarGoogle Scholar
  4. Hamid, Z., & Khafaji, H. K. (2021). A general algorithm of association rule-based machine learning dedicated for text classification. Journal of Physics Conference Series, 1773(1), 012011.Google ScholarGoogle ScholarCross RefCross Ref
  5. Pilar López-beda a, Manuel Carlos Díaz-Galiano a, Teodoro Martín-Noguerol b, B, A. L., L. Alfonso Urea-López a, & M. Teresa Martín-Valdivia a. (2021). Automatic medical protocol classification using machine learning approaches. Computer Methods and Programs in Biomedicine, 200(9),15-16.Google ScholarGoogle Scholar
  6. Faris, H., Habib, M., Faris, M., Alomari, A., Castillo, P. A., & Alomari, M. (2022). Classification of arabic healthcare questions based on word embeddings learned from massive consultations: a deep learning approach. Journal of ambient intelligence and humanized computing,85(4), 13.Google ScholarGoogle Scholar
  7. Occhipinti, A., Rogers, L., & Angione, C. (2022). A pipeline and comparative study of 12 machine learning models for text classification. arXiv e-prints,123(7),56-59.Google ScholarGoogle Scholar
  8. Odden, T. O. B., Marin, A., & Rudolph, J. L. (2021). How has science education changed over the last 100 years? an analysis using natural language processing. Science Education,854(6),65-68.Google ScholarGoogle Scholar
  9. Rajkumar, N., Subashini, T. S., Rajan, K., & Ramalingam, V. (2021). An efficient feature extraction with bidirectional long short term memory based deep learning model for tamil document classification. Journal of computational and theoretical nanoscience,874(3), 18.Google ScholarGoogle Scholar
  10. Song, G. (2021). Sentiment analysis of japanese text and vocabulary learning based on natural language processing and svm. Journal of Ambient Intelligence and Humanized Computing,45(5),75-78.Google ScholarGoogle Scholar
  11. Faris, H., Habib, M., Faris, M., Alomari, A., & Alomari, M. (2021). Classification of arabic healthcare questions based on word embeddings learned from massive consultations: a deep learning approach. Journal of Ambient Intelligence and Humanized Computing,65(2),35-39.Google ScholarGoogle Scholar
  12. Gasmi, K. (2022). Medical text classification based on an optimized machine learning and external semantic resource. Journal of circuits, systems and computers,847(52),125-129.Google ScholarGoogle Scholar
  13. Guberney Muetón-Santa, Escobar-Grisales, D., Felipe Orlando López-Pabón, Paula Andrea Pérez-Toro, & Orozco-Arroyave, J. R. (2022). Classification of poverty condition using natural language processing. Social Indicators Research, 162(3), 1413-1435.Google ScholarGoogle ScholarCross RefCross Ref
  14. El Mir, I., El Kafhali, S., & Haqiq, A. (2022). A hybrid learning approach fortext classification using natural language processing,85(7),55-58.Google ScholarGoogle Scholar
  15. Cherif, W., Madani, A., & Kissi, M. (2021). Text categorization based on a new classification by thresholds. Progress in Artificial Intelligence, 452(7),1-15.Google ScholarGoogle Scholar
  16. Penfold, R. B., Carrell, D. S., Cronkite, D. J., Pabiniak, C., Dodd, T., & Glass, A. M., et al. (2022). Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening. BMC Medical Informatics and Decision Making, 22(1), 1-13.Google ScholarGoogle ScholarCross RefCross Ref
  17. Alexis, A., Kyubum, L., Qingyu, C., Ling, L., & Zhiyong, L. Litsuggest: a web-based system for literature recommendation and curation using machine learning. Nucleic Acids Research(W1),96(74),88-92.Google ScholarGoogle Scholar
  18. Hagberg, E., Hagerman, D., Johansson, R., Hosseini, N., Liu, J., & Bjrnsson, E., et al. (2022). Semi-supervised learning with natural language processing for right ventricle classification in echocardiography—a scalable approach. Computers in Biology and Medicine, 143(4), 105282.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mariyam, A., Basha, S. A. H., & Raju, S. V. (2021). A literature survey on recurrent attention learning for text classification. IOP Conference Series: Materials Science and Engineering, 1042(1), 012030 (4pp).Google ScholarGoogle ScholarCross RefCross Ref
  20. Iqbal, S., Hassan, S. U., Aljohani, N. R., Alelyani, S., Nawaz, R., & Bornmann, L. (2021). A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies. Scientometrics, 126(3),666-668.Google ScholarGoogle Scholar

Index Terms

  1. A Natural Language Processing System for Text Classification Corpus Based on Machine Learning
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Asian and Low-Resource Language Information Processing
            ACM Transactions on Asian and Low-Resource Language Information Processing Just Accepted
            ISSN:2375-4699
            EISSN:2375-4702
            Table of Contents

            Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Online AM: 19 February 2024
            • Accepted: 31 January 2024
            • Revised: 24 December 2023
            • Received: 30 October 2023
            Published in tallip Just Accepted

            Check for updates

            Qualifiers

            • research-article
          • Article Metrics

            • Downloads (Last 12 months)130
            • Downloads (Last 6 weeks)35

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader