Skip to main content
Log in

A dual progressive strategy for long-tailed visual recognition

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Unlike the roughly balanced dataset used in the experiments, the long-tail phenomenon in the dataset is more common when applied in practice. Most previous work has typically used re-sampling, re-weighting, and ensemble learning to mitigate the long-tail problem. The first two are the most commonly used (as are we) due to their better generality. Differently, assigning weights to classes directly using the inverse of the sample size to solve such problems may not be a good strategy, which often sacrifices the performance of the head classes. We propose a new approach to cost allocation, which consists of two parts: the first part is trained in an unweighted manner to ensure that the network is adequately fitted to the head data. The second part then dynamically assigns weights based on the relative difficulty of the class levels.In addition, we propose a novel, practical Grabcut-based data augmentation approach to increase the diversity and differentiation of the mid-tail class data. Extensive experiments on public and self-constructed long-tailed datasets demonstrate the effectiveness of our approach and achieve excellent performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Code Availability

The code that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)

  2. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer vision and pattern recognition, pp. 248–255 (2009)

  3. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Houlsby, N.: An image is worth \(16 \times 16\) words: Transformers for image recognition at scale. arXiv:2010.11929 (2020)

  4. Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Belongie, S.: The inaturalist species classification and detection dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8769–8778 (2018)

  5. Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. Adv. Neural Inf. Process. Syst., 27 (2014)

  6. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

  7. Sinha, S., Ohashi, H., Nakamura, K.: Class-wise difficulty-balanced loss for solving class-imbalance. In: Proceedings of the Asian Conference on Computer Vision (2020)

  8. Cao, K., Wei, C., Gaidon, A., Arechiga, N., Ma, T.: Learning imbalanced datasets with label-distribution-aware margin loss. Adv. Neural Inf. Process. Syst. 32 (2019)

  9. Liu, Z., Miao, Z., Zhan, X., Wang, J., Gong, B., Yu, S.X.: Large-scale long-tailed recognition in an open world. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2537–2546 (2019)

  10. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl Data Eng. 21, 1263–1284 (2009)

    Article  Google Scholar 

  11. Huang, C., Li, Y., Loy, C.C., Tang, X.: Learning deep representation for imbalanced classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5375–5384 (2016)

  12. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans Syst. Man. Cybern. Part B (Cybern.) 39, 539–550 (2008)

    Google Scholar 

  13. More, A.: Survey of resampling techniques for improving classification performance in unbalanced datasets. arXiv:1608.06048 (2016)

  14. Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 106, 249–259 (2018)

    Article  Google Scholar 

  15. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

  16. Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, pp. 878–887 (2005)

  17. Mahajan, D., Girshick, R., Ramanathan, V., He, K., Paluri, M., Li, Y., Van Der Maaten, L.: Exploring the limits of weakly supervised pretraining. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 181–196 (2018)

  18. Shen, L., Lin, Z., Huang, Q.: Relay backpropagation for effective learning of deep convolutional neural networks. In: European Conference on Computer Vision, pp. 467–482 (2016)

  19. Kang, B., Xie, S., Rohrbach, M., Yan, Z., Gordo, A., Feng, J., Kalantidis, Y.: Decoupling representation and classifier for long-tailed recognition. arXiv:1910.09217 (2019)

  20. Sarafianos, N., Xu, X., Kakadiaris, I.A.: Deep imbalanced attribute classification using visual attention aggregation. In: Proceedings of the European Conference on Computer Vision, pp. 680–697 (2018)

  21. Drummond, C., Holte, R.C.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, pp. 1–8 (2003)

  22. Khan, S.H., Hayat, M., Bennamoun, M., Sohel, F.A., Togneri, R.: Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans. Neural Netw. Learn. Syst. 29, 3573–3587 (2017)

    Google Scholar 

  23. Cui, Y., Jia, M., Lin, T.Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9268–9277(2019)

  24. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6, 429–449 (2002)

    Article  Google Scholar 

  25. Ren, J., Yu, C., Ma, X., Zhao, H., Yi, S.: Balanced meta-softmax for long-tailed visual recognition. Adv. Neural Inf. Process. syst. 33, 4175–4186 (2020)

    Google Scholar 

  26. Chou, H.P., Chang, S.C., Pan, J.Y., Wei, W., Juan, D.C.: Remix: rebalanced mixup. In: European Conference on Computer Vision, pp. 95–110 (2020)

  27. Zhang, Y., Wei, X.S., Zhou, B., Wu, J.: Bag of tricks for long-tailed visual recognition with deep convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3447–3455 (2021)

  28. Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. arXiv:1710.09412 (2017)

  29. Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., Bengio, Y.: Manifold mixup: Better representations by interpolating hidden states. In: International Conference on Machine Learning, pp. 6438–6447 (2019)

  30. Cui, Y., Song, Y., Sun, C., Howard, A., Belongie, S.: Large scale fine-grained categorization and domain-specific transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4109–4118 (2018)

  31. Rother, C., Kolmogorov, V., Blake, A.: “GrabCut’’ interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (TOG) 23, 309–314 (2004)

    Article  Google Scholar 

  32. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Zheng, X.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467 (2016)

  33. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2017)

    Article  Google Scholar 

  34. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lerer, A.: Automatic differentiation in pytorch (2017)

  35. Loshchilov, I., Hutter, F.: Sgdr: stochastic gradient descent with warm restarts. arXiv:1608.03983 (2016)

  36. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  37. Liang, Y., Zhu, L., Wang, X., Yang, Y.: A simple episodic linear probe improves visual recognition in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9559–9569 (2022)

Download references

Acknowledgements

The work described in this paper was supported by the National Natural Science Foundation of China (No. 61673396) and the Natural Science Foundation of Shandong Province (No. ZR2022MF260).

Funding

The work was supported by the National Natural Science Foundation of China (No.61673396) and the Natural Science Foundation of Shandong Province(No.ZR2022MF260).

Author information

Authors and Affiliations

Authors

Contributions

Q.Z. contributed to data curation; M.S. contributed to funding acquisition; G.C. contributed to writing—original draft; H.L contributed to writing—review & editing. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Guoqing Cao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval:

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, H., Cao, G., Shao, M. et al. A dual progressive strategy for long-tailed visual recognition. Machine Vision and Applications 35, 1 (2024). https://doi.org/10.1007/s00138-023-01480-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01480-5

Keywords

Navigation