Skip to main content
Log in

A semi-supervised method to generate a persian dataset for suggestion classification

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Suggestion mining has become a popular subject in the field of natural language processing (NLP) that is useful in areas like a service/product improvement. The purpose of this study is to provide an automated machine learning (ML) based approach to extract suggestions from Persian text. In this research, first, a novel two-step semi-supervised method has been proposed to generate a Persian dataset called ParsSugg, which is then used in the automatic classification of the user’s suggestions. The first step is manual labeling of data based on a proposed guideline, followed by a data augmentation phase. In the second step, using pre-trained Persian Bidirectional Encoder Representations from Transformers (ParsBERT) as a classifier and the data from the previous step, more data were labeled. The performance of various ML models, including Support Vector Machine (SVM), Random Forest (RF), Convolutional Neural Networks (CNN), Long Short Term Memory (LSTM), and the ParsBERT language model has been examined on the generated dataset. The F-score value of 97.27 for ParsBERT and about 94.5 for SVM and CNN classifiers were obtained for the suggestion class which is a promising result as the first research on suggestion classification on Persian texts. Also, the proposed guideline can be used for other NLP tasks, and the generated dataset can be used in other suggestion classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://hotelyar.com.

  2. https://www.mobile.ir/.

  3. https://www.snapptrip.com/.

  4. https://competitions.codalab.org/competitions/19955.

  5. Language Model-based Over-sampling Technique.

  6. https://www.snapptrip.com/.

  7. https://appen.com/.

  8. https://porsline.ir/.

  9. https://pypi.org/project/googletrans/.

  10. https://www.iranhotelonline.com/.

  11. https://www.jabama.com/.

  12. https://jainjas.com/.

  13. https://github.com/ZanyarMoh/SuggestionClassification/tree/main/datasets.

  14. https://github.com/google-research/bert.

  15. https://pypi.org/project/hazm/.

  16. https://pypi.org/project/demoji/.

  17. https://github.com/ZanyarMoh/SuggestionClassification/tree/main/classifiers.

  18. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html.

  19. https://github.com/m3hrdadfi/albert-persian.

References

  • Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 238–247. https://doi.org/10.3115/v1/P14-1023.

  • Brun, C., & Hagege, C. (2013). Suggestion mining: Detecting suggestions for improvement in users’ comments. Research in Computing Science, 70(79.7179), 31–41. http://www.rcs.cic.ipn.mx/rcs/2013_70/Suggestion.

    Google Scholar 

  • Dong, L., Wei, F., Duan, Y., Liu, X., Zhou, M., & Xu, K. (2013). The automated acquisition of suggestions from tweets. Twenty-Seventh AAAI Conference on Artificial Intelligence.

  • Farahani, M., Gharachorloo, M., Farahani, M., & Manthouri, M. (2021). ParsBERT: Transformer-based Model for Persian Language understanding. Neural Processing Letters, 53(6), 3831–3847. https://doi.org/10.1007/s11063-021-10528-4.

    Article  Google Scholar 

  • Leekha, M., Goswami, M., & Jain, M. (2020). A Multi-task Approach to Open Domain Suggestion Mining using Language Model for text Over-Sampling. In J. M. Jose, E. Yilmaz, J. Magalhães, P. Castells, N. Ferro, M. J. Silva, & F. Martins (Eds.), Advances in Information Retrieval (pp. 223–229). Springer International Publishing.

  • Li, J. (2019). Lijunyi at SemEval-2019 Task 9: An attention-based LSTM and ensemble of different models for suggestion mining from online reviews and forums. Proceedings of the 13th International Workshop on Semantic Evaluation, 1208–1212.

  • Liu, J., Wang, S., & Sun, Y. (2019). OleNet at SemEval-2019 Task 9: BERT based Multi-Perspective Models for Suggestion Mining. Proceedings of the 13th International Workshop on Semantic Evaluation, 1231–1236.

  • McHugh, M. L. (2012). Interrater reliability: The Kappa statistic. Biochemia Medica, 22(3), 276–282.

    Article  Google Scholar 

  • Negi, S. (2019). Suggestion Mining from Text. NUI Galway. Ph.D. thesis, National University of Ireland Galway (NUIG) (2019), http://hdl.handle.net/10379/14987.

  • Negi, S., & Buitelaar, P. (2015). Towards the extraction of customer-to-customer suggestions from reviews. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2159–2167.

  • Negi, S., Asooja, K., Mehrotra, S., & Buitelaar, P. (2016). A study of suggestions in opinionated texts and their automatic detection. Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, 170–178.

  • Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543.

  • Potamias, R. A., Neofytou, A., & Siolas, G. (2019). NTUA-ISLab at SemEval-2019 Task 9: Mining Suggestions in the wild. Proceedings of the 13th International Workshop on Semantic Evaluation, 1224–1230.

  • Reddy, T. R., Reddy, P. V., Mohan, T. M., & Dara, R. (2021). An approach for suggestion mining based on deep learning techniques. IOP Conference Series: Materials Science and Engineering, 1074(1), 12021.

  • Singal, S., Goel, T., Chopra, S., & Dahiya, S. (2020). Open Domain Suggestion Mining Leveraging Fine-Grained Analysis (Workshop Paper). 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM), 414–423. https://doi.org/10.1109/BigMM50055.2020.00069.

  • Tanwar, P., & Rai, P. (2020). A proposed system for opinion mining using machine learning, nlp and classifiers. IAES International Journal of Artificial Intelligence, 9(4), 726–733. https://doi.org/10.11591/ijai.v9.i4.pp726-733.

    Article  Google Scholar 

  • Wachsmuth, H., Trenkmann, M., Stein, B., Engels, G., & Palakarska, T. (2014). A review corpus for argumentation analysis. International Conference on Intelligent Text Processing and Computational Linguistics, 115–127.

  • Wicaksono, A. F., & Myaeng, S. H. (2012). Mining advices from weblogs. Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2347–2350.

  • Yamamoto, M., & Sekiya, T. (2019). m_y at SemEval-2019 Task 9: Exploring BERT for Suggestion Mining. Proceedings of the 13th International Workshop on Semantic Evaluation, 888–892.

  • Zhou, Q., Zhang, Z., Wu, H., & Wang, L. (2019). ZQM at SemEval-2019 Task9: A Single Layer CNN Based on Pre-trained Model for Suggestion Mining. Proceedings of the 13th International Workshop on Semantic Evaluation, 1287–1291. https://doi.org/10.18653/v1/S19-2226.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leila Safari.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Safari, L., Mohammady, Z. A semi-supervised method to generate a persian dataset for suggestion classification. Lang Resources & Evaluation (2023). https://doi.org/10.1007/s10579-023-09688-7

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10579-023-09688-7

Keywords

Navigation