Skip to main content
Log in

Multi-keypoints matching network for clothing detection

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Clothing detection is a hot research focus as its application of identifying the specific category of clothing, such as long-sleeved and short-sleeved. Image-based clothing detection requires the model to detect accurate position. At present, the approaches of clothing detection are mainly divided into two categories: one is top-down, which is anchor-based and needs to calculate the intersection over union between the anchor box and the bounding box, but it is limited by the setting of the anchor box and does not perform well when the clothing scale is variable; the other is bottom-up, which uses the feature extraction network to get the keypoints and calculates the position and size of the clothing, but the prediction of the keypoints often has a slight error for it lacks the internal information of the clothing. To address the above issues, we propose a multi-keypoints matching network for clothing detection (MKMnet) based on the bottom-up method. It detects three keypoints (top-left corner point, bottom-right corner point, and center point) to ensure high detecting accuracy. Firstly, we perform corner keypoint matching by calculating the distance between the embedding vectors of different corner points to get the initial bounding box, then we get the final bounding box by matching the center point. The way to get the bounding box by corner point matching makes the model have the ability to detect clothing of any scale and shape, and adding the center point for further verification eliminates a large number of false-positive bounding boxes. The MKMnet proposed in this paper can obtain the bounding boxes accurately through the linear combination of the center point, and improving the accuracy of clothing recognition. The experimental results show that the MKMnet has higher accuracy than existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Availability of data

The data that support the findings of this study are available online. These datasets were derived from the following public domain resources: [DeepFashion2].

References

  1. Chen, H., Gallagher, A., Girod, B.: Describing clothing by semantic attributes. In: Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part III 12, pp. 609–623 (2012). Springer

  2. Yan, S., Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Unconstrained fashion landmark detection via hierarchical recurrent transformer networks. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 172–180 (2017)

  3. Wang, W., Xu, Y., Shen, J., Zhu, S.-C.: Attentive fashion grammar network for fashion landmark detection and clothing category classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4271–4280 (2018)

  4. Yamaguchi, K., Hadi Kiapour, M., Berg, T.L.: Paper doll parsing: Retrieving similar styles to parse clothing items. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3519–3526 (2013)

  5. Ji, X., Wang, W., Zhang, M., Yang, Y.: Cross-domain image retrieval with attention modeling. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 1654–1662 (2017)

  6. Liao, L., He, X., Zhao, B., Ngo, C.-W., Chua, T.-S.: Interpretable multimodal retrieval for fashion products. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1571–1579 (2018)

  7. Law, H., Deng, J.: Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  9. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  10. Ge, Y., Zhang, R., Wang, X., Tang, X., Luo, P.: Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5337–5345 (2019)

  11. Chen, M., Qin, Y., Qi, L., Sun, Y.: Improving fashion landmark detection by dual attention feature enhancement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0 (2019)

  12. Shajini, M., Ramanan, A.: An improved landmark-driven and spatial-channel attentive convolutional neural network for fashion clothes classification. Vis. Comput. 37(6), 1517–1526 (2021)

    Article  Google Scholar 

  13. Lin, T.-H.: Aggregation and finetuning for clothes landmark detection. arXiv preprint arXiv:2005.00419 (2020)

  14. Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4974–4983 (2019)

  15. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)

  16. Majuran, S., Ramanan, A.: A single-stage fashion clothing detection using multilevel visual attention. The Visual Computer, 1–15 (2022)

  17. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37 (2016). Springer

  18. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

  19. Yang, W., Luo, P., Lin, L.: Clothing co-parsing by joint image segmentation and labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3182–3189 (2014)

  20. Hadi Kiapour, M., Han, X., Lazebnik, S., Berg, A.C., Berg, T.L.: Where to buy it: Matching street clothing photos in online shops. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3343–3351 (2015)

  21. Zheng, S., Yang, F., Kiapour, M.H., Piramuthu, R.: Modanet: A large-scale street fashion dataset with polygon annotations. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1670–1678 (2018)

  22. Sidnev, A., Krapivin, A., Trushkov, A., Krasikova, E., Kazakov, M.: Deepmark++: Centernet-based clothing detection (2020)

  23. Kim, H.J., Lee, D.H., Niaz, A., Kim, C.Y., Memon, A.A., Choi, K.N.: Multiple-clothing detection and fashion landmark estimation using a single-stage detector. IEEE Access 9, 11694–11704 (2021)

    Article  Google Scholar 

  24. Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019)

  25. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

  26. Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet++ for object detection. arXiv preprint arXiv:2204.08394 (2022)

  27. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755 (2014). Springer

  28. Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: Reppoints: Point set representation for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9657–9666 (2019)

  29. Tian, Q., Chanda, S., Gray, D.: Improving apparel detection with category grouping and multi-grained branches. Multimedia Tools and Applications 82(5), 7383–7400 (2023)

    Article  Google Scholar 

Download references

Funding

This work was supported in part by the Young People Fund of Xinjiang Science and Technology Department (No 2022D01B05).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ye Li.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Zhang, W., Wu, M. et al. Multi-keypoints matching network for clothing detection. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03337-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-024-03337-y

Keywords

Navigation