Abstract
Multi-source functional block-wise missing data arise more commonly in medical care recently with the rapid development of big data and medical technology, hence there is an urgent need to develop efficient dimension reduction to extract important information for classification under such data. However, most existing methods for classification problems consider high-dimensional data as covariates. In the paper, we propose a novel multinomial imputed-factor Logistic regression model with multi-source functional block-wise missing data as covariates. Our main contribution is to establishing two multinomial factor regression models by using the imputed multi-source functional principal component scores and imputed canonical scores as covariates, respectively, where the missing factors are imputed by both the conditional mean imputation and the multiple block-wise imputation approaches. Specifically, the univariate FPCA is carried out for the observable data of each data source firstly to obtain the univariate principal component scores and the eigenfunctions. Then, the block-wise missing univariate principal component scores instead of the block-wise missing functional data are imputed by the conditional mean imputation method and the multiple block-wise imputation method, respectively. After that, based on the imputed univariate factors, the multi-source principal component scores are constructed by using the relationship between the multi-source principal component scores and the univariate principal component scores; and at the same time, the canonical scores are obtained by the multiple-set canonial correlation analysis. Finally, the multinomial imputed-factor Logistic regression model is established with the multi-source principal component scores or the canonical scores as factors. Numerical simulations and real data analysis on ADNI data show the proposed method works well.
Similar content being viewed by others
References
Bai, J., & Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1), 191–221.
Bai, J. S., & Li, K. P. (2012). Statistical analysis of factor models of high dimension. The Annals of Statistics, 40(1), 436–465.
Bair, E., Hastie, T., Paul, D., & Tibshirani, R. (2006). Prediction by supervised principal components. Journal of the American Statistical Association, 101, 119–137.
Berrendero, J. R., Justel, A., & Svarc, M. (2011). Principal components for multivariate functional data. Computational Statistics and Data Analysis, 55(9), 2619–2634.
Cai, T., Cai, T. T., & Zhang, A. (2016). Structured matrix completion with applications to genomic data integration. Journal of the American Statistical Association, 111(514), 621–633.
Campos, S., Pizarro, L., Valle, C., Gray, K. R., Rueckert, D., & Allende, H. (2015). Evaluating imputation techniques for missing data in ADNI: A patient classification study. Iberoamerican congress on pattern Recognition, Vol. 9423, pp. 3–10. Cham, Switzerland: Springer.
Chiou, J. M., Chen, Y. T., & Yang, Y. F. (2014). Multivariate functional principal component analysis: A normalization approach. Statistica Sinica, 24, 1571–1596.
Choi, J. Y., Hwang, H., Yamamoto, M., et al. (2017). A unified approach to functional principal component analysis and functional multiple-set canonical correlation. Psychometrika, 82, 427–441.
Correa, N. M., Eichele, T., Adali, T., Li, Y., & Calhoun, V. D. (2010). Multi-set canonical correlation analysis for the fusion of concurrent single trial ERP and functional MRI. NeuroImage, 50, 1438–1445.
Gao, Q., & Lee, T. C. (2017). High-dimensional variable selection in regression and classification with missing data. Signal Processing the Official Publication of the European Association for Signal Processing, 131, 1–7.
Happ, C., & Greven, S. (2018). Multivariate functional principal component analysis for data observed on different (dimensional) domains. Journal of the American Statistical Association, 113(522), 649–659.
He, Y., Kong, X. B., Yu, L., & Zhang, X. S. (2022). Large-dimensional factor analysis without moment constraints. Journal of Business & Economic Statistics, 40(1), 302–312.
Hwang, H., Jung, K., Takane, Y., et al. (2012). Functional multiple-set canonical correlation analysis. Psychometrika, 77, 48–64.
Hwang, H., Jung, K., Takane, Y., & Woodward, T. S. (2013). A unified approach to multiple-set canonical correlation analysis and principal components analysis. British Journal of Mathematical & Statistical Psychology, 66(2), 308–321.
Jacques, J., & Preda, C. (2014). Model-based clustering for multivariate functional data. Computational Statistics and Data Analysis, 71, 92–106.
Koldar, T. G., & Bader, B. W. (2009). Tensor decompositions and applications. SIAM Review, 51, 455–500.
Li, Y., Wang, N., & Carroll, R. J. (2013). Selecting the number of principal components in functional data. Journal of the American Statistical Association, 108, 1284–1294.
Liu, M., Zhang, J., Yap, P. T., & Shen, D. (2017). View-aligned hypergraph learning for Alzheimer’s disease diagnosis with incomplete multi-modality data. Medical Image Analysis, 36, 123–134.
Poldrack, R. A., Mumford, J. A., & Nichols, T. E. (2011). Handbook of functional MRI data analysis. Cambridge University Press.
Ramsay, J. O., & Silverman, B. W. (2005). Functional data analysis. Berlin: Springer.
Saporta, G. (1981). Méthodes exploratoires d’analyse de données temporelles. Cahiers Du Bureau Universitaire De Recherche Opérationnelle Série Recherche, 37, 7–194.
Takane, Y., & Hwang, H. (2002). Generalized constrained canonical correlation analysis. Multivariate Behavioral Research, 37, 163–195.
Takane, Y., Hwang, H., & Abdi, H. (2008). Regularized multiple-set canonical correlation analysis. Psychometrika, 73, 753–775.
Tenenhaus, A., & Tenenhaus, M. (2011). Regularized generalized canonical correlation analysis. Psychometrika, 76, 257–284.
Tenenhaus, M., Tenenhaus, A., & Groenen, P. J. F. (2017). Regularized generalized canonical correlation analysis: a framework for sequential multiblock component methods. Psychometrika, 82, 737–777.
Tenenhaus, A., Philippe, C., & Frouin, V. (2015). Kernel generalized canonical correlation analysis. Computational Statistics & Data Analysis, 90, 114–131.
Xiang, S., Yuan, L., Fan, W., Wang, Y., Thompson, P. M., Ye, J., & Initiative, Alzheimer’s Disease Neuroimaging. (2014). Bi-level multi-source learning for heterogeneous block-wise missing data. NeuroImage, 102, 192–206.
Xue, F., & Qu, A. (2021). Integrating multisource block-wise missing data in model selection. Journal of the American Statistical Association, 116(536), 1914–1927.
Yao, F., Müller, H. G., & Wang, J. L. (2005). Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association, 100(470), 577–590.
Yu, G., Li, Q., Shen, D., & Liu, Y. (2020). Optimal sparse linear prediction for block-missing multi-modality data without imputation. Journal of the American Statistical Association, 115(531), 1406–1419.
Yuan, L., Wang, Y., Thompson, P. M., Narayan, V. A., Ye, J., & Initiative, Alzheimer’s Disease Neuroimaging. (2012). Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. NeuroImage, 61(3), 622–632.
Zhang, Y., Tang, N., & Qu, A. (2020). Imputed factor regression for high-dimensional block-wise missing data. Statistica Sinica, 30(2), 631–651.
Zhu, H., Shen, D., Peng, X., & Liu, L. Y. (2017). MWPCR: Multiscale weighted principal component regression for high-dimensional prediction. Journal of the American Statistical Association, 112, 1009–1021.
Acknowledgements
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. We also thank editors and three anonymous reviews for their constructive comments that helped improve the quality of this article.
Funding
This research is supported by the National Social Science Foundation of China (No.21BTJ044).
Author information
Authors and Affiliations
Consortia
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wpcontent/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Du, X., Jiang, X., Lin, J. et al. Multinomial Logistic Factor Regression for Multi-source Functional Block-wise Missing Data. Psychometrika 88, 975–1001 (2023). https://doi.org/10.1007/s11336-023-09918-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-023-09918-5
Keywords
- Multi-source functional block-wise missing data
- Multi-source functional principal component analysis (MFPCA)
- Multi-source principal component scores
- Multiple-set canonical correlation analysis (MCCA)
- Canonical scores
- Conditional mean imputation
- Multiple block-wise imputation
- Multinomial Logistic factor regression model
- ADNI