RC-Net: Row and Column Network with Text Feature for Parsing Floor Plan Images

Wang, Teng; Meng, Wei-Liang; Lu, Zheng-Da; Guo, Jian-Wei; Xiao, Jun; Zhang, Xiao-Peng

doi:10.1007/s11390-023-3117-x

RC-Net: Row and Column Network with Text Feature for Parsing Floor Plan Images

Regular Paper
Published: 30 May 2023

Volume 38, pages 526–539, (2023)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Teng Wang^1,2,
Wei-Liang Meng^1,2,
Zheng-Da Lu²,
Jian-Wei Guo^1,2,
Jun Xiao² &
…
Xiao-Peng Zhang^1,2

206 Accesses
1 Citation
Explore all metrics

Abstract

The popularity of online home design and floor plan customization has been steadily increasing. However, the manual conversion of floor plan images from books or paper materials into electronic resources can be a challenging task due to the vast amount of historical data available. By leveraging neural networks to identify and parse floor plans, the process of converting these images into electronic materials can be significantly streamlined. In this paper, we present a novel learning framework for automatically parsing floor plan images. Our key insight is that the room type text is very common and crucial in floor plan images as it identifies the important semantic information of the corresponding room. However, this clue is rarely considered in previous learning-based methods. In contrast, we propose the Row and Column network (RC-Net) for recognizing floor plan elements by integrating the text feature. Specifically, we add the text feature branch in the network to extract text features corresponding to the room type for the guidance of room type predictions. More importantly, we formulate the Row and Column constraint module (RC constraint module) to share and constrain features across the entire row and column of the feature maps to ensure that only one type is predicted in each room as much as possible, making the segmentation boundaries between different rooms more regular and cleaner. Extensive experiments on three benchmark datasets validate that our framework substantially outperforms other state-of-the-art approaches in terms of the metrics of FWIoU, mACC and mIoU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Hori O, Tanigawa S. Raster-to-vector conversion by line fitting based on contours and skeletons. In Proc. the 2nd International Conference on Document Analysis and Recognition, Oct. 1993, pp.353–358. https://doi.org/10.1109/ICDAR.1993.395716.
Liu C, Wu J J, Kohli P, Furukawa Y. Raster-to-vector: Revisiting floorplan transformation. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2214–2222. https://doi.org/10.1109/ICCV.2017.241.
Chen K, Lai Y K, Wu Y X, Martin R, Hu S M. Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information. ACM Trans. Graphics, 2014, 33(6): Article No. 208. https://doi.org/10.1145/2661229.2661239.
Liu C, Wu J Y, Furukawa Y. FloorNet: A unified framework for floorplan reconstruction from 3D scans. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.201–217. https://doi.org/10.1007/978-3-030-01231-1_13.
Sharma D, Gupta N, Chattopadhyay C, Mehta S. DANIEL: A deep architecture for automatic analysis and retrieval of building floor plans. In Proc. the 14th IAPR International Conference on Document Analysis and Recognition, Nov. 2017, pp.420–425. https://doi.org/10.1109/ICDAR.2017.76.
Sharma D, Gupta N, Chattopadhyay C, Mehta S. A novel feature transform framework using deep neural network for multimodal floor plan retrieval. International Journal on Document Analysis and Recognition (IJDAR), 2019, 22(4): 417–429. https://doi.org/10.1007/s10032-019-00340-1.
Article Google Scholar
Zhang Y D, Song S R, Tan P, Xiao J X. PanoContext: A whole-room 3D context model for panoramic scene understanding. In Proc. the 13th European Conference on Computer Vision, Sept. 2014, pp.668–686. https://doi.org/10.1007/978-3-319-10599-4_43.
Yang S T, Wang F E, Peng C H, Wonka P, Sun M, Chu H K. DuLa-Net: A dual-projection network for estimating room layouts from a single RGB panorama. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.3363–3372. https://doi.org/10.1109/CVPR.2019.00348.
Xu Z W, Rong Z, Wu Y H. A survey: Which features are required for dynamic visual simultaneous localization and mapping? Visual Computing for Industry, Biomedicine, and Art, 2021, 4(1): Article No. 20. https://doi.org/10.1186/s42492-021-00086-w.
Ahmed S, Liwicki M, Weber M, Dengel A. Improved automatic analysis of architectural floor plans. In Proc. the 2011 International Conference on Document Analysis and Recognition, Sept. 2011, pp.864–869. https://doi.org/10.1109/ICDAR.2011.177.
de las Heras L P, Fernández D, Valveny E, Lladós J, Sánchez G. Unsupervised wall detector in architectural floor plans. In Proc. the 12th International Conference on Document Analysis and Recognition, Aug. 2013, pp.1245–1249. https://doi.org/10.1109/ICDAR.2013.252.
de las Heras L P, Mas J, Sánchez G, Valveny E. Wall patch-based segmentation in architectural floorplans. In Proc. the 2011 International Conference on Document Analysis and Recognition, Sept. 2011, pp.1270–1274. https://doi.org/10.1109/ICDAR.2011.256.
Ahmed S, Liwicki M, Weber M, Dengel A. Automatic room detection and room labeling from architectural floor plans. In Proc. the 10th IAPR International Workshop on Document Analysis Systems, Mar. 2012, pp.339–343. https://doi.org/10.1109/DAS.2012.22.
Ravagli J, Ziran Z, Marinai S. Text recognition and classification in floor plan images. In Proc. the 2019 International Conference on Document Analysis and Recognition Workshops, Sept. 2019. https://doi.org/10.1109/ICDARW.2019.00006.
Yamasaki T, Zhang J, Takada Y. Apartment structure estimation using fully convolutional networks and graph model. In Proc. the 2018 ACM Workshop on Multimedia for Real Estate Tech., Jun. 2018. https://doi.org/10.1145/3210499.3210528.
Zeng Z L, Li X Z, Yu Y K, Fu C W. Deep floor plan recognition using a multi-task network with room-boundary-guided attention. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27–Nov. 2, 2019, pp.9096–9104. https://doi.org/10.1109/ICCV.2019.00919.
Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, 39(4): 640–651. https://doi.org/10.1109/TPAMI.2016.2572683.
Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. In Proc. the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Oct. 2015, pp. 234–241. https://doi.org/10.1007/978-3-319-24574-4_28.
Dosch P, Tombre K, Ah-Soon C, Masini G. A complete system for the analysis of architectural drawings. International Journal on Document Analysis and Recognition, 2000, 3(2): 102–116. https://doi.org/10.1007/PL00010901.
Article Google Scholar
Or S H, Wong K H, Yu Y K, Chang M M Y. Highly automatic approach to architectural floorplan image understanding & model generation. In Proc. the VMV2005, Nov. 2005, pp.25–32.
Macé S, Locteau H, Valveny E, Tabbone S. A system to detect rooms in architectural floor plan images. In Proc. the 9th IAPR International Workshop on Document Analysis Systems, Jun. 2010, pp.167–174. https://doi.org/10.1145/1815330.1815352.
de las Heras L P, Ahmed S, Liwicki M, Valveny E, Sánchez G. Statistical segmentation and structural recognition for floor plan interpretation. International Journal on Document Analysis and Recognition (IJDAR), 2014, 17(3): 221–237. https://doi.org/10.1007/s10032-013-0215-2.
Dodge S, Xu J, Stenger B. Parsing floor plan images. In Proc. the 17th IAPR International Conference on Machine Vision Applications, May 2017, pp.358–361. https://doi.org/10.23919/MVA.2017.7986875.
Ren S, He K, Girshick R et al. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2017, 39(6): 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031.
Article Google Scholar
Huang W X, Zheng H. Architectural drawings recognition and generation through machine learning. In Proc. the 38th Annual Conference of the Association for Computer Aided Design in Architecture, Oct. 2018, pp.18–20. https://doi.org/10.52842/conf.acadia.2018.156.
Lu Z D, Wang T, Guo J W, Meng W L, Xiao J, Zhang W, Zhang X P. Data-driven floor plan understanding in rural residential buildings via deep recognition. Information Sciences, 2021, 567: 58–74. https://doi.org/10.1016/j.ins.2021.03.032.
Article Google Scholar
Lv X L, Zhao S C, Yu X Y, Zhao B Q. Residential floor plan recognition and reconstruction. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.16712–16721. https://doi.org/10.1109/CVPR46437.2021.01644.
Chen L C, Zhu Y K, Papandreou G, Schroff F, Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.833–851. https://doi.org/10.1007/978-3-030-01234-2_49.
Yang B S, Jiang T P, Wu W T, Zhou Y Z, Dai L. Automated semantics and topology representation of residential-building space using floor-plan raster maps. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15: 7809–7825. https://doi.org/10.1109/JSTARS.2022.3205746.
Article Google Scholar
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd International Conference on Learning Representations, May 2015.
Lin T Y, Goyal P, Girshick R, He K M, Dollár P. Focal loss for dense object detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 2020, 42(2): 318–327. https://doi.org/10.1109/TPAMI.2018.2858826.
Article Google Scholar
Liu C X, Schwing A G, Kundu K, Urtasun R, Fidler S. Rent3D: Floor-plan priors for monocular layout estimation. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2015, pp.3413–3421. https://doi.org/10.1109/CVPR.2015.7298963.
Kalervo A, Ylioinas J, Häikiö M, Karhu A, Kannala J. CubiCasa5K: A dataset and an improved multi-task model for floorplan image analysis. In Proc. the 21st Scandinavian Conference on Image Analysis, Jun. 2019, pp.28–40. https://doi.org/10.1007/978-3-030-20205-7_3.
Zhao H S, Shi J P, Qi X J, Wang X G, Jia J Y. Pyramid scene parsing network. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.2881–2890. https://doi.org/10.1109/CVPR.2017.660.

Download references

Author information

Authors and Affiliations

State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Teng Wang, Wei-Liang Meng, Jian-Wei Guo & Xiao-Peng Zhang
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
Teng Wang, Wei-Liang Meng, Zheng-Da Lu, Jian-Wei Guo, Jun Xiao & Xiao-Peng Zhang

Authors

Teng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Liang Meng
View author publications
You can also search for this author in PubMed Google Scholar
Zheng-Da Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jian-Wei Guo
View author publications
You can also search for this author in PubMed Google Scholar
Jun Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jian-Wei Guo or Jun Xiao.

Additional information

Associate Professor Guo supervises this project, helps to implement the experiments, and plays a key role in promoting efficient and accurate communication. Professor Xiao gives a great contribution to experiment improvements, and is crucial in conveying information accurately in an English-speaking context.

Supplementary Information

ESM 1

(PDF 838 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, T., Meng, WL., Lu, ZD. et al. RC-Net: Row and Column Network with Text Feature for Parsing Floor Plan Images. J. Comput. Sci. Technol. 38, 526–539 (2023). https://doi.org/10.1007/s11390-023-3117-x

Download citation

Received: 21 January 2023
Accepted: 24 May 2023
Published: 30 May 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11390-023-3117-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RC-Net: Row and Column Network with Text Feature for Parsing Floor Plan Images

Abstract

Access this article

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation