Abstract
Clothing detection is a hot research focus as its application of identifying the specific category of clothing, such as long-sleeved and short-sleeved. Image-based clothing detection requires the model to detect accurate position. At present, the approaches of clothing detection are mainly divided into two categories: one is top-down, which is anchor-based and needs to calculate the intersection over union between the anchor box and the bounding box, but it is limited by the setting of the anchor box and does not perform well when the clothing scale is variable; the other is bottom-up, which uses the feature extraction network to get the keypoints and calculates the position and size of the clothing, but the prediction of the keypoints often has a slight error for it lacks the internal information of the clothing. To address the above issues, we propose a multi-keypoints matching network for clothing detection (MKMnet) based on the bottom-up method. It detects three keypoints (top-left corner point, bottom-right corner point, and center point) to ensure high detecting accuracy. Firstly, we perform corner keypoint matching by calculating the distance between the embedding vectors of different corner points to get the initial bounding box, then we get the final bounding box by matching the center point. The way to get the bounding box by corner point matching makes the model have the ability to detect clothing of any scale and shape, and adding the center point for further verification eliminates a large number of false-positive bounding boxes. The MKMnet proposed in this paper can obtain the bounding boxes accurately through the linear combination of the center point, and improving the accuracy of clothing recognition. The experimental results show that the MKMnet has higher accuracy than existing methods.
Similar content being viewed by others
Availability of data
The data that support the findings of this study are available online. These datasets were derived from the following public domain resources: [DeepFashion2].
References
Chen, H., Gallagher, A., Girod, B.: Describing clothing by semantic attributes. In: Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part III 12, pp. 609–623 (2012). Springer
Yan, S., Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Unconstrained fashion landmark detection via hierarchical recurrent transformer networks. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 172–180 (2017)
Wang, W., Xu, Y., Shen, J., Zhu, S.-C.: Attentive fashion grammar network for fashion landmark detection and clothing category classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4271–4280 (2018)
Yamaguchi, K., Hadi Kiapour, M., Berg, T.L.: Paper doll parsing: Retrieving similar styles to parse clothing items. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3519–3526 (2013)
Ji, X., Wang, W., Zhang, M., Yang, Y.: Cross-domain image retrieval with attention modeling. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 1654–1662 (2017)
Liao, L., He, X., Zhao, B., Ngo, C.-W., Chua, T.-S.: Interpretable multimodal retrieval for fashion products. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1571–1579 (2018)
Law, H., Deng, J.: Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Ge, Y., Zhang, R., Wang, X., Tang, X., Luo, P.: Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5337–5345 (2019)
Chen, M., Qin, Y., Qi, L., Sun, Y.: Improving fashion landmark detection by dual attention feature enhancement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0 (2019)
Shajini, M., Ramanan, A.: An improved landmark-driven and spatial-channel attentive convolutional neural network for fashion clothes classification. Vis. Comput. 37(6), 1517–1526 (2021)
Lin, T.-H.: Aggregation and finetuning for clothes landmark detection. arXiv preprint arXiv:2005.00419 (2020)
Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4974–4983 (2019)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
Majuran, S., Ramanan, A.: A single-stage fashion clothing detection using multilevel visual attention. The Visual Computer, 1–15 (2022)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37 (2016). Springer
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Yang, W., Luo, P., Lin, L.: Clothing co-parsing by joint image segmentation and labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3182–3189 (2014)
Hadi Kiapour, M., Han, X., Lazebnik, S., Berg, A.C., Berg, T.L.: Where to buy it: Matching street clothing photos in online shops. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3343–3351 (2015)
Zheng, S., Yang, F., Kiapour, M.H., Piramuthu, R.: Modanet: A large-scale street fashion dataset with polygon annotations. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1670–1678 (2018)
Sidnev, A., Krapivin, A., Trushkov, A., Krasikova, E., Kazakov, M.: Deepmark++: Centernet-based clothing detection (2020)
Kim, H.J., Lee, D.H., Niaz, A., Kim, C.Y., Memon, A.A., Choi, K.N.: Multiple-clothing detection and fashion landmark estimation using a single-stage detector. IEEE Access 9, 11694–11704 (2021)
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet++ for object detection. arXiv preprint arXiv:2204.08394 (2022)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755 (2014). Springer
Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: Reppoints: Point set representation for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9657–9666 (2019)
Tian, Q., Chanda, S., Gray, D.: Improving apparel detection with category grouping and multi-grained branches. Multimedia Tools and Applications 82(5), 7383–7400 (2023)
Funding
This work was supported in part by the Young People Fund of Xinjiang Science and Technology Department (No 2022D01B05).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, Y., Zhang, W., Wu, M. et al. Multi-keypoints matching network for clothing detection. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03337-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s00371-024-03337-y