Image-Based Flow Prediction of Vocal Folds Using 3D Convolutional Neural Networks

Zhang, Yang; Pu, Tianmei; Xu, Jiasen; Zhou, Chunhua

doi:10.1007/s42235-023-00466-3

Image-Based Flow Prediction of Vocal Folds Using 3D Convolutional Neural Networks

Research Article
Published: 12 January 2024

Volume 21, pages 991–1002, (2024)
Cite this article

Journal of Bionic Engineering Aims and scope Submit manuscript

Yang Zhang ORCID: orcid.org/0000-0002-2605-6593¹,
Tianmei Pu²,
Jiasen Xu¹ &
…
Chunhua Zhou³

247 Accesses
Explore all metrics

Abstract

In this work, a three dimensional (3D) convolutional neural network (CNN) model based on image slices of various normal and pathological vocal folds is proposed for accurate and efficient prediction of glottal flows. The 3D CNN model is composed of the feature extraction block and regression block. The feature extraction block is capable of learning low dimensional features from the high dimensional image data of the glottal shape, and the regression block is employed to flatten the output from the feature extraction block and obtain the desired glottal flow data. The input image data is the condensed set of 2D image slices captured in the axial plane of the 3D vocal folds, where these glottal shapes are synthesized based on the equations of normal vibration modes. The output flow data is the corresponding flow rate, averaged glottal pressure and nodal pressure distributions over the glottal surface. The 3D CNN model is built to establish the mapping between the input image data and output flow data. The ground-truth flow variables of each glottal shape in the training and test datasets are obtained by a high-fidelity sharp-interface immersed-boundary solver. The proposed model is trained to predict the concerned flow variables for glottal shapes in the test set. The present 3D CNN model is more efficient than traditional Computational Fluid Dynamics (CFD) models while the accuracy can still be retained, and more powerful than previous data-driven prediction models because more details of the glottal flow can be provided. The prediction performance of the trained 3D CNN model in accuracy and efficiency indicates that this model could be promising for future clinical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison of convolutional neural networks for classification of vocal fold nodules from high-speed video images

Article 11 November 2022

Prediction of aerodynamic flow fields using convolutional neural networks

Article 12 June 2019

Fast Flow Field Estimation for Various Applications with A Universally Applicable Machine Learning Concept

Article Open access 11 December 2020

Data Availability Statement

The data that supports the findings of this study are available from the corresponding author upon reasonable request.

References

Cho, W. K., & Choi, S. H. (2022). Comparison of convolutional neural network models for determination of vocal fold normality in laryngoscopic images. Journal of Voice, 36(5), 590–598.
Article Google Scholar
Smith, S. L., & Titze, I. R. (2018). Vocal fold contact patterns based on normal modes of vibration. Journal of Biomechanics, 73, 177–184.
Article Google Scholar
Aziz, M., Dillman, D., Fu, R., & Brambrink, A. (2012). Comparative effectiveness of the C-MAC video laryngoscope versus direct laryngoscopy in the setting of the predicted difficult airway. Anesthesiology, 116(3), 629–636.
Article Google Scholar
Niforopoulou, P., Pantazopoulos, I., Demestiha, T. D., Koudouna, E., & Xanthos, T. T. (2010). Video-laryngoscopes in the adult airway management: a topical review of the literature. Acta Anaesthesiologica Scandinavica, 54(9), 1050–1061.
Article Google Scholar
Wu, Z., Xuan, S., Xie, J., Lin, C., & Lu, C. (2022). How to ensure the confidentiality of electronic medical records on the cloud: A technical perspective. Computers in Biology and Medicine, 147, 105726.
Article Google Scholar
Beinecke, J. M., Anders, P., Schurrat, T., Heider, D., Luster, M., Librizzi, D., & Hauschild, A.-C. (2022). Evaluation of machine learning strategies for imaging confirmed prostate cancer recurrence prediction on electronic health records. Computers in Biology and Medicine, 143, 105263.
Article Google Scholar
Hu, K., Zhao, L., Feng, S., Zhang, S., Zhou, Q., Gao, X., & Guo, Y. (2022). Colorectal polyp region extraction using saliency detection network with neutrosophic enhancement. Computers in Biology and Medicine, 147, 105760.
Article Google Scholar
Hossain, E., Rana, R., Higgins, N., Soar, J., Barua, P., Pisani, A., & Turner, K. (2023). Natural language processing in electronic health records in relation to healthcare decision-making: A systematic review. Computers in Biology and Medicine, 155, 106649–106649.
Article Google Scholar
Luo, H., Mittal, R., Zheng, X., Bielamowicz, S. A., Walsh, R. J., & Hahn, J. K. (2008). An immersed-boundary method for flow structure interaction in biological systems with application to phonation. Journal of Computational Physics, 227(22), 9303–9332.
Article MathSciNet Google Scholar
Zheng, X., Xue, Q., Mittal, R., & Beilamowicz, S. (2010). A coupled sharp-interface immersed boundary-finite-element method for flow-structure interaction with application to human phonation. Journal of Biomechanical Engineering, 132(11), 111003.
Article Google Scholar
Mittal, R., Zheng, X., Bhardwaj, R., Seo, J., Xue, Q., & Bielamowicz, S. (2011). Toward a simulation-based tool for the treatment of vocal fold paralysis. Frontiers in Physiology, 2, 19.
Article Google Scholar
Xue, Q., Zheng, X., Mittal, R., & Bielamowicz, S. (2014). Subject-specific computational modeling of human phonation. The Journal of the Acoustical Society of America, 135(3), 1445–1456.
Article Google Scholar
Jiang, W., Zheng, X., & Xue, Q. (2017). Computational modeling of fluid–structure–acoustics interaction during voice production. Frontiers in Bioengineering and Biotechnology, 5, 7.
Article Google Scholar
Gómez, P., Schützenberger, A., Semmler, M., & Döllinger, M. (2018). Laryngeal pressure estimation with a recurrent neural network. IEEE Journal of Translational Engineering in Health and Medicine, 7, 1–11.
Article Google Scholar
Zhang, Z. (2020). Estimation of vocal fold physiology from voice acoustics using machine learning. The Journal of the Acoustical Society of America, 147(3), 264–270.
Article MathSciNet Google Scholar
Li, Z., Chen, Y., Chang, S., Rousseau, B., & Luo, H. (2021). A one-dimensional flow model enhanced by machine learning for simulation of vocal fold vibration. The Journal of the Acoustical Society of America, 149(3), 1712–1723.
Article Google Scholar
Zhang, Y., Jiang, W., Sun, L., Wang, J., Zheng, X., & Xue, Q. (2022). A deep learning-based generalized empirical flow model of glottal flow during normal phonation. Journal of Biomechanical Engineering, 144(9), 091001.
Google Scholar
Matava, C. T., Pankiv, E., Raisbeck, S., Caldeira, M., & Alam, F. (2020). A convolutional neural network for real time classification, identification, and labelling of vocal cord and tracheal using laryngoscopy and bronchoscopy video. Journal of Medical Systems, 44, 1–10.
Article Google Scholar
Kist, A. M., Zilker, J., Gómez, P., Schützenberger, A., & Döllinger, M. (2020). Rethinking glottal midline detection. Scientific Reports, 10(1), 1–15.
Article Google Scholar
Laves, M.-H., Bicker, J., Kahrs, L. A., & Ortmaier, T. (2018). A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation. International Journal of Computer Assisted Radiology and Surgery, 14, 483–492.
Article Google Scholar
Yao, P., Witte, D., Gimonet, H., German, A., Andreadis, K., Cheng, M., Sulica, L., Elemento, O., Barnes, J., & Rameau, A. (2022). Automatic classification of informative laryngoscopic images using deep learning. Laryngoscope Investigative Otolaryngology, 7(2), 460–466.
Article Google Scholar
Wang, J., Xu, X., Ma, Y., Zhuang, P., & Wang, Y. (2022). Application of 4D-CT scanning in differential diagnosis of arytenoid subluxation and vocal fold paralysis. Journal of Voice, 36(6), 859–867.
Article Google Scholar
Story, B. H., Titze, I. R., & Hoffman, E. A. (1998). Vocal tract area functions for an adult female speaker based on volumetric imaging. The Journal of the Acoustical Society of America, 104(1), 471–487.
Article Google Scholar
Zunair, H., Rahman, A., Mohammed, N., & Cohen, J. P. (2020). Uniformizing techniques to process CT scans with 3D CNNs for tuberculosis prediction. PRIME 2020: International workshop on PRedictive Intelligence In MEdicine, Lima, Peru (pp. 156–168).
Maturana, D., & Scherer, S. (2015). Voxnet: A 3D convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), Hamburg, Germany (pp. 922–928).
Zhang, Y., Zheng, X., & Xue, Q. (2020). A deep neural network based glottal flow model for predicting fluid–structure interactions during voice production. Applied Sciences, 10(2), 705.
Article Google Scholar
Mittal, R., Dong, H., Bozkurttas, M., Najjar, F., Vargas, A., & Von Loebbecke, A. (2008). A versatile sharp interface immersed boundary method for incompressible flows with complex boundaries. Journal of Computational Physics, 227(10), 4825–4852.
Article MathSciNet Google Scholar
Indolia, S., Goswami, A. K., Mishra, S. P., & Asopa, P. (2018). Conceptual understanding of convolutional neural network—A deep learning approach. Procedia Computer Science, 132, 679–688.
Article Google Scholar
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Jin, X., Cheng, P., Chen, W.-L., & Li, H. (2018). Prediction model of velocity field around circular cylinder over various Reynolds numbers by fusion convolutional neural networks based on pressure on the cylinder. Physics of Fluids, 30(4), 047105.
Article Google Scholar
Peng, J.-Z., Chen, S., Aubry, N., Chen, Z.-H., & Wu, W.-T. (2020). Time-variant prediction of flow over an airfoil using deep neural network. Physics of Fluids, 32(12), 123602.
Article Google Scholar

Download references

Acknowledgements

This work is supported by the Open Project of Key Laboratory of Computational Aerodynamics, AVIC Aerodynamics Research Institute (Grant No. YL2022XFX0409).

Author information

Authors and Affiliations

College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, China
Yang Zhang & Jiasen Xu
College of Electrical, Energy and Power Engineering, Yangzhou University, Yangzhou, 225127, China
Tianmei Pu
College of Aeronautics, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, China
Chunhua Zhou

Authors

Yang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tianmei Pu
View author publications
You can also search for this author in PubMed Google Scholar
Jiasen Xu
View author publications
You can also search for this author in PubMed Google Scholar
Chunhua Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Zhang.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Pu, T., Xu, J. et al. Image-Based Flow Prediction of Vocal Folds Using 3D Convolutional Neural Networks. J Bionic Eng 21, 991–1002 (2024). https://doi.org/10.1007/s42235-023-00466-3

Download citation

Received: 05 June 2023
Revised: 09 November 2023
Accepted: 07 December 2023
Published: 12 January 2024
Issue Date: March 2024
DOI: https://doi.org/10.1007/s42235-023-00466-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image-Based Flow Prediction of Vocal Folds Using 3D Convolutional Neural Networks

Abstract

Access this article

Similar content being viewed by others

Comparison of convolutional neural networks for classification of vocal fold nodules from high-speed video images

Prediction of aerodynamic flow fields using convolutional neural networks

Fast Flow Field Estimation for Various Applications with A Universally Applicable Machine Learning Concept

Data Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Image-Based Flow Prediction of Vocal Folds Using 3D Convolutional Neural Networks

Abstract

Access this article

Similar content being viewed by others

Comparison of convolutional neural networks for classification of vocal fold nodules from high-speed video images

Prediction of aerodynamic flow fields using convolutional neural networks

Fast Flow Field Estimation for Various Applications with A Universally Applicable Machine Learning Concept

Data Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation