research-article

On the Value of Head Labels in Multi-Label Text Classification

Authors:
Haobo Wang

School of Software Technology, Zhejiang University, Hangzhou, China

School of Software Technology, Zhejiang University, Hangzhou, China

0000-0001-8586-3048
View Profile

,
Cheng Peng

The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China

The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China

0000-0002-7006-1728
View Profile

,
Hede Dong

The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China

The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China

0009-0007-4885-2762
View Profile

,
Lei Feng

School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore

School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore

0000-0003-2839-5799
View Profile

,
Weiwei Liu

School of Computer Science, Wuhan University, Wuhan, China

School of Computer Science, Wuhan University, Wuhan, China

0000-0003-2450-3369
View Profile

,
Tianlei Hu

The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China

The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China

0000-0003-0744-6454
View Profile

,
Ke Chen

The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China

The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China

0000-0002-3062-0900
View Profile

,
Gang Chen

The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China

The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China

0000-0002-7483-0045
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 18 Issue 5Article No.: 124pp 1–21https://doi.org/10.1145/3643853

Published:26 March 2024Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

A formidable challenge in the multi-label text classification (MLTC) context is that the labels often exhibit a long-tailed distribution, which typically prevents deep MLTC models from obtaining satisfactory performance. To alleviate this problem, most existing solutions attempt to improve tail performance by means of sampling or introducing extra knowledge. Data-rich labels, though more trustworthy, have not received the attention they deserve. In this work, we propose a multiple-stage training framework to exploit both model- and feature-level knowledge from the head labels, to improve both the representation and generalization ability of MLTC models. Moreover, we theoretically prove the superiority of our framework design over other alternatives. Comprehensive experiments on widely used MLTC datasets clearly demonstrate that the proposed framework achieves highly superior results to state-of-the-art methods, highlighting the value of head labels in MLTC.

REFERENCES

[1] Aletras Nikolaos, Tsarapatsanis Dimitrios, Preotiuc-Pietro Daniel, and Lampos Vasileios. 2016. Predicting judicial decisions of the European Court of Human Rights: A natural language processing perspective. PeerJ Comput. Sci. 2 (2016), e93.Google ScholarCross Ref
[2] Allen-Zhu Zeyuan, Li Yuanzhi, and Song Zhao. 2019. A convergence theory for deep learning via over-parameterization. In Proceedings of the ICML. Vol. 97, PMLR, 242–252.Google Scholar
[3] Babbar Rohit and Schölkopf Bernhard. 2017. DiSMEC: Distributed sparse machines for extreme multi-label classification. In Proceedings of the WSDM. ACM, 721–729.Google ScholarDigital Library
[4] Babbar Rohit and Schölkopf Bernhard. 2019. Data scarcity, robustness and extreme multi-label classification. Mach. Learn. 108, 8–9 (2019), 1329–1351.Google ScholarDigital Library
[5] Bhatia K., Dahiya K., Jain H., Mittal A., Prabhu Y., and Varma M.. 2016. The extreme classification repository: Multi-label datasets and code. Retrieved from http://manikvarma.org/downloads/XC/XMLRepository.html. Accessed 1-1-2024.Google Scholar
[6] Bhatia Kush, Jain Himanshu, Kar Purushottam, Varma Manik, and Jain Prateek. 2015. Sparse local embeddings for extreme multi-label classification. In Proceedings of the NeurIPS. 730–738.Google Scholar
[7] Chalkidis Ilias, Fergadiotis Manos, Kotitsas Sotiris, Malakasiotis Prodromos, Aletras Nikolaos, and Androutsopoulos Ion. 2020. An empirical study on large-scale multi-label text classification including few and zero-shot labels. In Proceedings of the EMNLP. Association for Computational Linguistics, 7503–7515.Google ScholarCross Ref
[8] Chang Wei-Cheng, Yu Hsiang-Fu, Zhong Kai, Yang Yiming, and Dhillon Inderjit S.. 2020. Taming pretrained transformers for extreme multi-label text classification. In Proceedings of the SIGKDD. ACM, 3163–3171.Google ScholarDigital Library
[9] Chen Boli, Huang Xin, Xiao Lin, and Jing Liping. 2020. Hyperbolic capsule networks for multi-label classification. In Proceedings of the ACL. Association for Computational Linguistics, 3115–3124.Google ScholarCross Ref
[10] Chen Yao-Nan and Lin Hsuan-Tien. 2012. Feature-aware label space dimension reduction for multi-label classification. In Proceedings of the NeurIPS. 1538–1546.Google Scholar
[11] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the NAACL-HLT. Association for Computational Linguistics, 4171–4186.Google Scholar
[12] Izmailov Pavel, Podoprikhin Dmitrii, Garipov Timur, Vetrov Dmitry P., and Wilson Andrew Gordon. 2018. Averaging weights leads to wider optima and better generalization. In Proceedings of the UAI. AUAI Press, 876–885.Google Scholar
[13] Jain Himanshu, Prabhu Yashoteja, and Varma Manik. 2016. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the KDD. ACM, 935–944.Google ScholarDigital Library
[14] Jiang Ting, Wang Deqing, Sun Leilei, Yang Huayi, Zhao Zhengyang, and Zhuang Fuzhen. 2021. LightXML: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification. In Proceedings of the AAAI. AAAI Press, 7987–7994.Google ScholarCross Ref
[15] Jing L. and Tian Y.. 2020. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 43, 11 (2021), 4037–4058. DOI:.Google ScholarCross Ref
[16] Khandagale Sujay, Xiao Han, and Babbar Rohit. 2020. Bonsai: Diverse and shallow trees for extreme multi-label classification. Mach. Learn. 109, 11 (2020), 2099–2119.Google ScholarDigital Library
[17] Li Yuncong, Yin Cunxiang, Zhong Sheng-hua, and Pan Xu. 2020. Multi-instance multi-label learning networks for aspect-category sentiment analysis. In Proceedings of the EMNLP. Association for Computational Linguistics, 3550–3560.Google ScholarCross Ref
[18] Liu Jingzhou, Chang Wei-Cheng, Wu Yuexin, and Yang Yiming. 2017. Deep learning for extreme multi-label text classification. In Proceedings of the SIGIR. ACM, 115–124.Google ScholarDigital Library
[19] Liu Weiwei and Shen Xiaobo. 2019. Sparse extreme multi-label learning with oracle property. In Proceedings of the ICML. Proceedings of Machine Learning Research, Vol. 97, PMLR, 4032–4041.Google Scholar
[20] Liu Yixin and Liu Pengfei. 2021. SimCLS: A simple framework for contrastive learning of abstractive summarization. In Proceedings of the ACL/IJCNLP. Association for Computational Linguistics, 1065–1072.Google ScholarCross Ref
[21] Liu Yinhan, Ott Myle, Goyal Naman, Du Jingfei, Joshi Mandar, Chen Danqi, Levy Omer, Lewis Mike, Zettlemoyer Luke, and Stoyanov Veselin. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arxiv:1907.11692. Retrieved from http://arxiv.org/abs/1907.11692Google Scholar
[22] Maurer Andreas. 2016. A vector-contraction inequality for rademacher complexities. In Proceedings of the ALT. Vol. 9925, 3–17.Google ScholarDigital Library
[23] Maurer Andreas, Pontil Massimiliano, and Romera-Paredes Bernardino. 2016. The benefit of multitask representation learning. J. Mach. Learn. Res. 17, 1 (2016), 2853–2884.Google Scholar
[24] McAuley Julian J. and Leskovec Jure. 2013. Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the RecSys. ACM, 165–172.Google ScholarDigital Library
[25] Mencía Eneldo Loza and Fürnkranz Johannes. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In Proceedings of the ECML/PKDD . Lecture Notes in Computer Science, Vol. 5212, Springer, 50–65.Google ScholarDigital Library
[26] Mullenbach James, Wiegreffe Sarah, Duke Jon, Sun Jimeng, and Eisenstein Jacob. 2018. Explainable prediction of medical codes from clinical text. In Proceedings of the NAACL-HLT. Association for Computational Linguistics, 1101–1111.Google ScholarCross Ref
[27] Nam Jinseok, Mencía Eneldo Loza, Kim Hyunwoo J., and Fürnkranz Johannes. 2017. Maximizing subset accuracy with recurrent neural networks in multi-label classification. In Proceedings of the NeurIPS. 5413–5423.Google Scholar
[28] Pan Sinno Jialin and Yang Qiang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 10 (2010), 1345–1359.Google ScholarDigital Library
[29] Prabhu Yashoteja, Kag Anil, Harsola Shrutendra, Agrawal Rahul, and Varma Manik. 2018. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In Proceedings of the WWW. ACM, 993–1002.Google ScholarDigital Library
[30] Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2021. Classifier chains: A review and perspectives. J. Artif. Intell. Res. 70 (2021), 683–718. Google ScholarDigital Library
[31] Rios Anthony and Kavuluru Ramakanth. 2018. Few-shot and zero-shot multi-label learning for structured label spaces. In Proceedings of the EMNLP. Association for Computational Linguistics, 3132–3142.Google ScholarCross Ref
[32] Mohammadreza Qaraei, Erik Schultheis, Priyanshu Gupta, and Rohit Babbar. 2021. Convex surrogates for unbiased loss functions in extreme classification with missing labels. WWW’21: The Web Conference 2021, Virtual Event/Ljubljana, Slovenia, April 19-23, 2021, Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila Zia (Eds.). ACM/IW3C2, 3711–3720. Google ScholarDigital Library
[33] Tagami Yukihiro. 2017. AnnexML: Approximate nearest neighbor search for extreme multi-label classification. In Proceedings of the SIGKDD. ACM, 455–464.Google ScholarDigital Library
[34] Wang Hong, Wang Xin, Xiong Wenhan, Yu Mo, Guo Xiaoxiao, Chang Shiyu, and Wang William Yang. 2019. Self-supervised learning for contextualized extractive summarization. In Proceedings of the ACL. Association for Computational Linguistics, 2221–2227.Google ScholarCross Ref
[35] Wei Tong and Li Yu-Feng. 2020. Does tail label help for large-scale multi-label learning? IEEE Trans. Neural Networks Learn. Syst. 31, 7 (2020), 2315–2324.Google Scholar
[36] Wei Tong, Tu Wei-Wei, Li Yu-Feng, and Yang Guo-Ping. 2021. Towards robust prediction on tail labels. In Proceedings of the SIGKDD. ACM, 1812–1820.Google ScholarDigital Library
[37] Wettig Alexander, Gao Tianyu, Zhong Zexuan, and Chen Danqi. 2023. Should you mask 15% in masked language modeling?. In Proceedings of the EACL. Association for Computational Linguistics, 2977–2992.Google ScholarCross Ref
[38] Wolf Thomas, Debut Lysandre, Sanh Victor, Chaumond Julien, Delangue Clement, Moi Anthony, Cistac Pierric, Rault Tim, Louf Rémi, Funtowicz Morgan, and Brew Jamie. 2019. HuggingFace’s transformers: State-of-the-art natural language processing. arxiv:1910.03771. Retrieved from http://arxiv.org/abs/1910.03771Google Scholar
[39] Xu Chang, Tao Dacheng, and Xu Chao. 2016. Robust extreme multi-label learning. In Proceedings of the SIGKDD. ACM, 1275–1284.Google ScholarDigital Library
[40] Yang Pengcheng, Sun Xu, Li Wei, Ma Shuming, Wu Wei, and Wang Houfeng. 2018. SGM: Sequence generation model for multi-label classification. In Proceedings of the COLING. Association for Computational Linguistics, 3915–3926.Google Scholar
[41] Yang Yuzhe and Xu Zhi. 2020. Rethinking the value of labels for improving class-imbalanced learning. In Proceedings of the NeurIPS.Google Scholar
[42] Yang Zhilin, Dai Zihang, Yang Yiming, Carbonell Jaime G., Salakhutdinov Ruslan, and Le Quoc V.. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Proceedings of the NeurIPS. 5754–5764.Google Scholar
[43] Ye Hui, Chen Zhiyu, Wang Da-Han, and Davison Brian D.. 2020. Pretrained generalized autoregressive model with adaptive probabilistic label clusters for extreme multi-label text classification. In Proceedings of the ICML. Proceedings of Machine Learning Research, Vol. 119, PMLR, 10809–10819.Google Scholar
[44] Yen Ian En-Hsu, Huang Xiangru, Dai Wei, Ravikumar Pradeep, Dhillon Inderjit S., and Xing Eric P.. 2017. PPDsparse: A parallel primal-dual sparse method for extreme classification. In Proceedings of the SIGKDD. ACM, 545–553.Google ScholarDigital Library
[45] You Ronghui, Zhang Zihan, Wang Ziye, Dai Suyang, Mamitsuka Hiroshi, and Zhu Shanfeng. 2019. AttentionXML: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In Proceedings of the NeurIPS. 5812–5822.Google Scholar
[46] Zubiaga Arkaitz. 2012. Enhancing navigation on Wikipedia with social tags. arxiv:1202.5469. Retrieved from http://arxiv.org/abs/1202.5469Google Scholar

Index Terms

On the Value of Head Labels in Multi-Label Text Classification
1. Computing methodologies
  1. Machine learning

Recommendations

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification
WWW '22: Proceedings of the ACM Web Conference 2022

Large-scale multi-label text classification (LMTC) aims to associate a document with its relevant labels from a large candidate set. Most existing LMTC approaches rely on massive human-annotated training data, which are often costly to obtain and suffer ...
Read More
Multi-label Text Classification with Label Correction under Noise
ICCPR '21: Proceedings of the 2021 10th International Conference on Computing and Pattern Recognition

Multi-label text classification (MLTC) is a fundamental but difficult problem in text mining, the goal of MLTC is to assign a set of most relevant labels for the given document. While existing supervised training of deep learning models for MLTC ...
Read More
Graph Convolutional Network Exploring Label Relations for Multi-label Text Classification
PRICAI 2021: Trends in Artificial Intelligence
Abstract
Multi-label Text Classification (MLTC) aims to learn a classifier that is able to automatically annotate a data point with the most relevant subset of labels from an large number of labels. Label semantics and relationships are important ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 18, Issue 5
June 2024
699 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3613659
Editor:
Jian Pei
Duke University, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 March 2024
- Online AM: 5 February 2024
- Accepted: 24 January 2024
- Revised: 14 December 2023
- Received: 26 May 2022
Published in tkdd Volume 18, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Multi-label text classification
long-tail
self-supervised learning
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 156
  Total Downloads
- Downloads (Last 12 months)156
- Downloads (Last 6 weeks)59
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

On the Value of Head Labels in Multi-Label Text Classification

ACM Transactions on Knowledge Discovery from Data

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification

Multi-label Text Classification with Label Correction under Noise

Graph Convolutional Network Exploring Label Relations for Multi-label Text Classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

Caption

On the Value of Head Labels in Multi-Label Text Classification

ACM Transactions on Knowledge Discovery from Data

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification

Multi-label Text Classification with Label Correction under Noise

Graph Convolutional Network Exploring Label Relations for Multi-label Text Classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

Share this Publication link

Share on Social Media