Skip to main content
Log in

Multinomial Logistic Factor Regression for Multi-source Functional Block-wise Missing Data

  • Theory and Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Multi-source functional block-wise missing data arise more commonly in medical care recently with the rapid development of big data and medical technology, hence there is an urgent need to develop efficient dimension reduction to extract important information for classification under such data. However, most existing methods for classification problems consider high-dimensional data as covariates. In the paper, we propose a novel multinomial imputed-factor Logistic regression model with multi-source functional block-wise missing data as covariates. Our main contribution is to establishing two multinomial factor regression models by using the imputed multi-source functional principal component scores and imputed canonical scores as covariates, respectively, where the missing factors are imputed by both the conditional mean imputation and the multiple block-wise imputation approaches. Specifically, the univariate FPCA is carried out for the observable data of each data source firstly to obtain the univariate principal component scores and the eigenfunctions. Then, the block-wise missing univariate principal component scores instead of the block-wise missing functional data are imputed by the conditional mean imputation method and the multiple block-wise imputation method, respectively. After that, based on the imputed univariate factors, the multi-source principal component scores are constructed by using the relationship between the multi-source principal component scores and the univariate principal component scores; and at the same time, the canonical scores are obtained by the multiple-set canonial correlation analysis. Finally, the multinomial imputed-factor Logistic regression model is established with the multi-source principal component scores or the canonical scores as factors. Numerical simulations and real data analysis on ADNI data show the proposed method works well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Bai, J., & Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1), 191–221.

    Article  Google Scholar 

  • Bai, J. S., & Li, K. P. (2012). Statistical analysis of factor models of high dimension. The Annals of Statistics, 40(1), 436–465.

    Article  Google Scholar 

  • Bair, E., Hastie, T., Paul, D., & Tibshirani, R. (2006). Prediction by supervised principal components. Journal of the American Statistical Association, 101, 119–137.

    Article  Google Scholar 

  • Berrendero, J. R., Justel, A., & Svarc, M. (2011). Principal components for multivariate functional data. Computational Statistics and Data Analysis, 55(9), 2619–2634.

    Article  Google Scholar 

  • Cai, T., Cai, T. T., & Zhang, A. (2016). Structured matrix completion with applications to genomic data integration. Journal of the American Statistical Association, 111(514), 621–633.

    Article  PubMed  PubMed Central  Google Scholar 

  • Campos, S., Pizarro, L., Valle, C., Gray, K. R., Rueckert, D., & Allende, H. (2015). Evaluating imputation techniques for missing data in ADNI: A patient classification study. Iberoamerican congress on pattern Recognition, Vol. 9423, pp. 3–10. Cham, Switzerland: Springer.

  • Chiou, J. M., Chen, Y. T., & Yang, Y. F. (2014). Multivariate functional principal component analysis: A normalization approach. Statistica Sinica, 24, 1571–1596.

    Google Scholar 

  • Choi, J. Y., Hwang, H., Yamamoto, M., et al. (2017). A unified approach to functional principal component analysis and functional multiple-set canonical correlation. Psychometrika, 82, 427–441.

    Article  PubMed  Google Scholar 

  • Correa, N. M., Eichele, T., Adali, T., Li, Y., & Calhoun, V. D. (2010). Multi-set canonical correlation analysis for the fusion of concurrent single trial ERP and functional MRI. NeuroImage, 50, 1438–1445.

    Article  PubMed  Google Scholar 

  • Gao, Q., & Lee, T. C. (2017). High-dimensional variable selection in regression and classification with missing data. Signal Processing the Official Publication of the European Association for Signal Processing, 131, 1–7.

    Google Scholar 

  • Happ, C., & Greven, S. (2018). Multivariate functional principal component analysis for data observed on different (dimensional) domains. Journal of the American Statistical Association, 113(522), 649–659.

    Article  Google Scholar 

  • He, Y., Kong, X. B., Yu, L., & Zhang, X. S. (2022). Large-dimensional factor analysis without moment constraints. Journal of Business & Economic Statistics, 40(1), 302–312.

    Article  Google Scholar 

  • Hwang, H., Jung, K., Takane, Y., et al. (2012). Functional multiple-set canonical correlation analysis. Psychometrika, 77, 48–64.

    Article  Google Scholar 

  • Hwang, H., Jung, K., Takane, Y., & Woodward, T. S. (2013). A unified approach to multiple-set canonical correlation analysis and principal components analysis. British Journal of Mathematical & Statistical Psychology, 66(2), 308–321.

    Article  Google Scholar 

  • Jacques, J., & Preda, C. (2014). Model-based clustering for multivariate functional data. Computational Statistics and Data Analysis, 71, 92–106.

    Article  Google Scholar 

  • Koldar, T. G., & Bader, B. W. (2009). Tensor decompositions and applications. SIAM Review, 51, 455–500.

    Article  Google Scholar 

  • Li, Y., Wang, N., & Carroll, R. J. (2013). Selecting the number of principal components in functional data. Journal of the American Statistical Association, 108, 1284–1294.

    Article  Google Scholar 

  • Liu, M., Zhang, J., Yap, P. T., & Shen, D. (2017). View-aligned hypergraph learning for Alzheimer’s disease diagnosis with incomplete multi-modality data. Medical Image Analysis, 36, 123–134.

    Article  PubMed  Google Scholar 

  • Poldrack, R. A., Mumford, J. A., & Nichols, T. E. (2011). Handbook of functional MRI data analysis. Cambridge University Press.

    Book  Google Scholar 

  • Ramsay, J. O., & Silverman, B. W. (2005). Functional data analysis. Berlin: Springer.

    Book  Google Scholar 

  • Saporta, G. (1981). Méthodes exploratoires d’analyse de données temporelles. Cahiers Du Bureau Universitaire De Recherche Opérationnelle Série Recherche, 37, 7–194.

    Google Scholar 

  • Takane, Y., & Hwang, H. (2002). Generalized constrained canonical correlation analysis. Multivariate Behavioral Research, 37, 163–195.

    Article  Google Scholar 

  • Takane, Y., Hwang, H., & Abdi, H. (2008). Regularized multiple-set canonical correlation analysis. Psychometrika, 73, 753–775.

    Article  Google Scholar 

  • Tenenhaus, A., & Tenenhaus, M. (2011). Regularized generalized canonical correlation analysis. Psychometrika, 76, 257–284.

    Article  Google Scholar 

  • Tenenhaus, M., Tenenhaus, A., & Groenen, P. J. F. (2017). Regularized generalized canonical correlation analysis: a framework for sequential multiblock component methods. Psychometrika, 82, 737–777.

    Article  Google Scholar 

  • Tenenhaus, A., Philippe, C., & Frouin, V. (2015). Kernel generalized canonical correlation analysis. Computational Statistics & Data Analysis, 90, 114–131.

    Article  Google Scholar 

  • Xiang, S., Yuan, L., Fan, W., Wang, Y., Thompson, P. M., Ye, J., & Initiative, Alzheimer’s Disease Neuroimaging. (2014). Bi-level multi-source learning for heterogeneous block-wise missing data. NeuroImage, 102, 192–206.

    Article  PubMed  Google Scholar 

  • Xue, F., & Qu, A. (2021). Integrating multisource block-wise missing data in model selection. Journal of the American Statistical Association, 116(536), 1914–1927.

    Article  Google Scholar 

  • Yao, F., Müller, H. G., & Wang, J. L. (2005). Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association, 100(470), 577–590.

    Article  Google Scholar 

  • Yu, G., Li, Q., Shen, D., & Liu, Y. (2020). Optimal sparse linear prediction for block-missing multi-modality data without imputation. Journal of the American Statistical Association, 115(531), 1406–1419.

    Article  PubMed  Google Scholar 

  • Yuan, L., Wang, Y., Thompson, P. M., Narayan, V. A., Ye, J., & Initiative, Alzheimer’s Disease Neuroimaging. (2012). Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. NeuroImage, 61(3), 622–632.

    Article  PubMed  Google Scholar 

  • Zhang, Y., Tang, N., & Qu, A. (2020). Imputed factor regression for high-dimensional block-wise missing data. Statistica Sinica, 30(2), 631–651.

    Google Scholar 

  • Zhu, H., Shen, D., Peng, X., & Liu, L. Y. (2017). MWPCR: Multiscale weighted principal component regression for high-dimensional prediction. Journal of the American Statistical Association, 112, 1009–1021.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. We also thank editors and three anonymous reviews for their constructive comments that helped improve the quality of this article.

Funding

This research is supported by the National Social Science Foundation of China (No.21BTJ044).

Author information

Authors and Affiliations

Authors

Consortia

Corresponding author

Correspondence to Xiuli Du.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wpcontent/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (zip 1421 KB)

Supplementary file 2 (pdf 524 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, X., Jiang, X., Lin, J. et al. Multinomial Logistic Factor Regression for Multi-source Functional Block-wise Missing Data. Psychometrika 88, 975–1001 (2023). https://doi.org/10.1007/s11336-023-09918-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-023-09918-5

Keywords

Navigation