Abstract
In this article, we present an optimal edge-weighted graph semantic correlation (EWGSC) framework for multi-view feature representation learning. Different from most existing multi-view representation methods, local structural information and global correlation in multi-view feature spaces are exploited jointly in the EWGSC framework, leading to a new and high-quality multi-view feature representation. Specifically, a novel edge-weighted graph model is first conceptualized and developed to preserve local structural information in each of the multi-view feature spaces. Then, the explored structural information is integrated with a semantic correlation algorithm, labeled multiple canonical correlation analysis (LMCCA), to form a powerful platform for effectively exploiting local and global relations across multi-view feature spaces jointly. We then theoretically verified the relation between the upper limit on the number of projected dimensions and the optimal solution to the multi-view feature representation problem. To validate the effectiveness and generality of the proposed framework, we conducted experiments on five datasets of different scales, including visual-based (University of California Irvine (UCI) iris database, Olivetti Research Lab (ORL) face database, and Caltech 256 database), text-image-based (Wiki database), and video-based (Ryerson Multimedia Lab (RML) audio-visual emotion database) examples. The experimental results show the superiority of the proposed framework on multi-view feature representation over state-of-the-art algorithms.
- [1] . 2018. A survey of multi-view representation learning. IEEE Trans. Knowl. Data Eng. 31, 10 (2018), 1863–1883.Google ScholarCross Ref
- [2] . 2019. Multi-modal deep analysis for multimedia. IEEE Trans. Circ. Syst. Vid. Technol. 30, 10 (2019), 3740–3764.Google ScholarCross Ref
- [3] . 2017. Multi-view learning overview: Recent progress and new challenges. Inf. Fusion 38 (2017), 43–54.Google ScholarDigital Library
- [4] . 2010. Multi-view canonical correlation analysis. In Conference on Data Mining and Data Warehouses (SiKDD’10). 1–4.Google Scholar
- [5] . 2017. An overview of cross-media retrieval: Concepts, methodologies, benchmarks, and challenges. IEEE Trans. Circ. Syst. Vid. Technol. 28, 9 (2017), 2372–2385.Google ScholarDigital Library
- [6] . 2013. Multiview gait recognition based on patch distribution features and uncorrelated multilinear sparse local discriminant canonical correlation analysis. IEEE Trans. Circ. Syst. Vid. Technol. 24, 4 (2013), 617–630.Google Scholar
- [7] . 2007. Kernelized discriminative canonical correlation analysis. In International Conference on Wavelet Analysis and Pattern Recognition, Vol. 3. IEEE, 1283–1287.Google Scholar
- [8] . 2009. Joint blind source separation by multiset canonical correlation analysis. IEEE Trans. Sig. Process. 57, 10 (2009), 3918–3929.Google ScholarDigital Library
- [9] . 2015. Sparsity preserving multiple canonical correlation analysis with visual emotion recognition to multi-feature fusion. In IEEE International Conference on Image Processing (ICIP’15). IEEE, 2710–2714.Google ScholarDigital Library
- [10] . 2019. Graph multiview canonical correlation analysis. IEEE Trans. Sig. Process. 67, 11 (2019), 2826–2838.Google ScholarCross Ref
- [11] . 2017. Discriminative multiple canonical correlation analysis for information fusion. IEEE Trans. Image Process. 27, 4 (2017), 1951–1965.Google ScholarCross Ref
- [12] . 2003. Locality preserving projections. Adv. Neural Inf. Process. Syst. 16 (2003).Google Scholar
- [13] . 2019. Using orthogonal locality preserving projections to find dominant features for classifying retinal blood vessels. Multim. Tools Applic. 78, 10 (2019), 12783–12803.Google ScholarDigital Library
- [14] . 2018. The labeled multiple canonical correlation analysis for information fusion. IEEE Trans. Multim. 21, 2 (2018), 375–387.Google ScholarDigital Library
- [15] . 2020. Graph neural networks: A review of methods and applications. AI Open 1 (2020), 57–81.Google ScholarCross Ref
- [16] . 2021. Hierarchical graph neural networks for few-shot learning. IEEE Trans. Circ. Syst. Vid. Technol. 32, 1 (2021), 240–252.Google ScholarDigital Library
- [17] . 2023. Instance-wise multi-view representation learning. Inf. Fusion 91 (2023), 612–622.Google ScholarDigital Library
- [18] Yijie Lin, Yuanbiao Gou, Xiaotian Liu, Jinfeng Bai, Jiancheng Lv, and Xi Peng. 2023. Dual contrastive prediction for incomplete multi-view representation learning. IEEE Trans. Pattern Anal. Mach. Intell. 45, 4 (2023), 4447–4461.Google Scholar
- [19] . 2023. A discriminant information theoretic learning framework for multi-modal feature representation. ACM Trans. Intell. Syst. Technol. 14, 3 (2023), 1–24.Google ScholarDigital Library
- [20] . 2023. MetaViewer: Towards a unified multi-view representation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11590–11599.Google ScholarCross Ref
- [21] . 2023. Comprehensive multi-view representation learning. Inf. Fusion 89 (2023), 198–209.Google ScholarDigital Library
- [22] . 2017. Multi-view nonparametric discriminant analysis for image retrieval and recognition. IEEE Sig. Process. Lett. 24, 10 (2017), 1537–1541.Google ScholarCross Ref
- [23] . 2018. Graph regularized multiview marginal discriminant projection. J. Visual Commun. Image Represent. 57 (2018), 12–22.Google ScholarCross Ref
- [24] . 2019. Multi-view common component discriminant analysis for cross-view classification. Pattern Recog. 92 (2019), 37–51.Google ScholarDigital Library
- [25] . 2015. PCANet: A simple deep learning baseline for image classification? IEEE Trans. Image Process. 24, 12 (2015), 5017–5032.Google ScholarDigital Library
- [26] . 2016. A new discriminative sparse representation method for robust face recognition via \(l\_\){2}\(\) regularization. IEEE Trans. Neural Netw. Learn. Syst. 28, 10 (2016), 2233–2242.Google ScholarCross Ref
- [27] . 2019. Constrained discriminative projection learning for image classification. IEEE Trans. Image Process. 29 (2019), 186–198.Google ScholarCross Ref
- [28] . 2020. Akin-based orthogonal space (AOS): A subspace learning method for face recognition. Multim. Tools Applic. 79, 47 (2020), 35069–35091.Google ScholarDigital Library
- [29] . 2019. Interest point based face recognition using adaptive neuro fuzzy inference system. Multim. Tools Applic. 78, 16 (2019), 22691–22710.Google ScholarDigital Library
- [30] . 2018. Stable and orthogonal local discriminant embedding using trace ratio criterion for dimensionality reduction. Multim. Tools Applic. 77, 3 (2018), 3071–3081.Google ScholarDigital Library
- [31] . 2019. Generalized discriminant local median preserving projections (GDLMPP) for face recognition. Neural Process. Lett. 49, 3 (2019), 951–963.Google ScholarDigital Library
- [32] Theofanis Sapatinas. 2005. Discriminant analysis and statistical pattern recognition. Journal of the Royal Statistical Society Series A: Statistics in Society 168, 3 (2005), 635–636.Google Scholar
- [33] . 2016. A combined rule-based & machine learning audio-visual emotion recognition approach. IEEE Trans. Affect. Comput. 9, 1 (2016), 3–13.Google ScholarDigital Library
- [34] . 2020. Joint low rank embedded multiple features learning for audio–visual emotion recognition. Neurocomputing 388 (2020), 324–333.Google ScholarDigital Library
- [35] . 2017. Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Trans. Multim. 20, 6 (2017), 1576–1590.Google ScholarCross Ref
- [36] . 2019. Audio-visual emotion fusion (AVEF): A deep efficient weighted approach. Inf. Fusion 46 (2019), 184–192.Google ScholarDigital Library
- [37] Ioannis Kansizoglou, Loukas Bampis, and Antonios Gasteratos. 2022. An active learning paradigm for online audio-visual emotion recognition. IEEE Trans. Affect. Comput. 13, 2 (2022), 756–768.Google Scholar
- [38] . 2014. Large-scale supervised multimodal hashing with semantic correlation maximization. In AAAI Conference on Artificial Intelligence.Google ScholarDigital Library
- [39] . 2020. Improving cross-modal image-text retrieval with teacher-student learning. IEEE Trans. Circ. Syst. Vid. Technol. 31, 8 (2020), 3242–3253.Google ScholarCross Ref
- [40] . 2016. A deep semantic framework for multimodal representation learning. Multim. Tools Applic. 75, 15 (2016), 9255–9276.Google ScholarDigital Library
- [41] . 2016. Deep coupled metric learning for cross-modal matching. IEEE Trans. Multim. 19, 6 (2016), 1234–1244.Google ScholarDigital Library
- [42] . 2019. Sparse, collaborative, or nonnegative representation: which helps pattern classification? Pattern Recog. 88 (2019), 679–688.Google ScholarDigital Library
- [43] . 2017. An extended probabilistic collaborative representation based classifier for image classification. In IEEE International Conference on Multimedia and Expo (ICME’17). IEEE, 1392–1397.Google ScholarCross Ref
- [44] . 2018. Robust sparse linear discriminant analysis. IEEE Trans. Circ. Syst. Vid. Technol. 29, 2 (2018), 390–403.Google ScholarDigital Library
- [45] . 2019. Collaborative representation with k-nearest classes for classification. Pattern Recog. Lett. 117 (2019), 30–36.Google ScholarCross Ref
- [46] . 2021. Image classification with superpixels and feature fusion method. J. Electron. Sci. Technol. 19, 1 (2021), 100096.Google ScholarCross Ref
- [47] . 2017. Multiview label sharing for visual representations and classifications. IEEE Trans. Multim. 20, 4 (2017), 903–913.Google ScholarDigital Library
- [48] . 2016. A novel locally linear KNN method with applications to visual recognition. IEEE Trans. Neural Netw. Learn. Syst. 28, 9 (2016), 2010–2021.Google ScholarCross Ref
- [49] . 2017. Structured weak semantic space construction for visual categorization. IEEE Trans. Neural Netw. Learn. Syst. 29, 8 (2017), 3442–3451.Google Scholar
- [50] . 2020. ResFeats: Residual network based features for underwater image classification. Image Vis. Comput. 93 (2020), 103811.Google ScholarDigital Library
- [51] . 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
- [52] . 2017. ResFeats: Residual network based features for image classification. In IEEE International Conference on Image Processing (ICIP’17). IEEE, 1597–1601.Google ScholarDigital Library
- [53] . 2020. An autuencoder-based data augmentation strategy for generalization improvement of DCNNs. Neurocomputing 402 (2020), 283–297.Google ScholarCross Ref
- [54] . 2017. Convolutional sparse autoencoders for image classification. IEEE Trans. Neural Netw. Learn. Syst. 29, 7 (2017), 3289–3294.Google Scholar
- [55] . 2020. When dictionary learning meets deep learning: Deep dictionary learning and coding network for image recognition with limited data. IEEE Trans. Neural Netw. Learn. Syst. 32, 5 (2020), 2129–2141.Google ScholarCross Ref
- [56] . 2017. Borrowing treasures from the wealthy: Deep transfer learning through selective joint fine-tuning. In IEEE Conference on Computer Vision and Pattern Recognition. 1086–1095.Google ScholarCross Ref
- [57] . 2021. TransTailor: Pruning the pre-trained model for improved transfer learning. In AAAI Conference on Artificial Intelligence. 8627–8634.Google ScholarCross Ref
- [58] . 2020. Estimation of ergodicity limits of bag-of-words modeling for guaranteed stochastic convergence. Pattern Recog. 99 (2020), 107094.Google ScholarDigital Library
- [59] . 2017. Places: A 10 million image database for scene recognition. IEEE Trans. Image Process. 40, 6 (2017), 1452–1464.Google Scholar
- [60] . 2019. Features combined from hundreds of midlayers: Hierarchical networks with subnetwork nodes. IEEE Trans. Neural Netw. Learn. Syst. 30, 11 (2019), 3313–3325.Google ScholarCross Ref
- [61] . 2020. Wi-HSNN: A subnetwork-based encoding structure for dimension reduction and food classification via harnessing multi-CNN model high-level features. Neurocomputing 414 (2020), 57–66.Google ScholarCross Ref
- [62] . 2020. A width-growth model with subnetwork nodes and refinement structure for representation learning and image classification. IEEE Trans. Industr. Inform. 17, 3 (2020), 1562–1572.Google Scholar
- [63] . 2011. Face recognition using histograms of oriented gradients. Pattern Recog. Lett. 32, 12 (2011), 1598–1603.Google ScholarDigital Library
- [64] . 2013. A comparative study on local binary pattern (LBP) based face recognition: LBP histogram versus LBP image. Neurocomputing 120 (2013), 365–379.Google ScholarCross Ref
- [65] . 2002. Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans. Image Process. 11, 4 (2002), 467–476.Google ScholarDigital Library
- [66] . 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.Google ScholarDigital Library
- [67] . 2022. Dictionary learning and face recognition based on sample expansion. Appl. Intell. 52, 4 (2022), 3766–3780.Google ScholarDigital Library
- [68] . 2022. Improved image representation and sparse representation for face recognition. Multim. Tools Applic. (2022), 1–15.Google Scholar
- [69] . 2022. A novel approach for multimodal facial expression recognition using deep learning techniques. Multim. Tools Applic. 81, 13 (2022), 18521–18529.Google ScholarDigital Library
- [70] . 2022. Group class residual l 1-minimization on random projection sparse representation classifier for face recognition. Electronics 11, 17 (2022), 2723.Google ScholarCross Ref
- [71] . 2012. A deformable 3-D facial expression model for dynamic human emotional state recognition. IEEE Trans. Circ. Syst. Vid. Technol. 23, 1 (2012), 142–157.Google ScholarDigital Library
- [72] . 1996. Texture features for browsing and retrieval of image data. IEEE Trans. Image Process. 18, 8 (1996), 837–842.Google ScholarDigital Library
- [73] . 2022. MuLER: Multiplet-loss for emotion recognition. In International Conference on Multimedia Retrieval. 435–442.Google ScholarDigital Library
- [74] . 2023. LSTM model for visual speech recognition through facial expressions. Multim. Tools Applic. 82, 4 (2023), 5455–5472.Google ScholarDigital Library
- [75] . 2013. Efficient traffic sign detection using bag of visual words and multi-scales sift. In International Conference on Neural Information Processing. Springer, 433–441.Google ScholarCross Ref
- [76] . 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, Jan. (2003), 993–1022.Google ScholarDigital Library
- [77] . 2011. Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch. In Conference on Computer Vision and Pattern Recognition. IEEE, 593–600.Google ScholarDigital Library
- [78] . 2015. Cross-modal subspace learning via pairwise constraints. IEEE Trans. Image Process. 24, 12 (2015), 5543–5556.Google ScholarDigital Library
- [79] . 2019. Canonical correlation analysis with l 2, 1-norm for multiview data representation. IEEE Trans. Cybern. 50, 11 (2019), 4772–4782.Google ScholarCross Ref
- [80] . 2019. Shared-private information bottleneck method for cross-modal clustering. IEEE Access 7 (2019), 36045–36056.Google ScholarCross Ref
- [81] . 2021. A discriminative vectorial framework for multi-modal feature representation. IEEE Trans. Multim. 24 (2021), 1503–1514.Google ScholarDigital Library
- [82] . 2021. A discriminant kernel entropy-based framework for feature representation learning. J. Vis. Commun. Image Represent. 81 (2021), 103366.Google ScholarDigital Library
- [83] . 2023. Capacitive empirical risk function-based bag-of-words and pattern classification processes. Pattern Recog. 139 (2023), 109482.Google ScholarDigital Library
- [84] . 2021. Knowledge distillation with attention for deep transfer learning of convolutional networks. ACM Trans. Knowl. Discov. Data 16, 3 (2021), 1–20.Google ScholarDigital Library
- [85] . 2020. A baseline regularization scheme for transfer learning with convolutional neural networks. Pattern Recog. 98 (2020), 107049.Google ScholarDigital Library
- [86] . 2022. GrOD: Deep learning with gradients orthogonal decomposition for knowledge transfer, distillation, and adversarial training. ACM Trans. Knowl. Discov. Data 16, 6 (2022), 1–25.Google ScholarDigital Library
- [87] . 2020. Regularizing CNN transfer learning with randomised regression. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13637–13646.Google ScholarCross Ref
- [88] . 2023. Active learning with co-auxiliary learning and multi-level diversity for image classification. IEEE Trans. Circ. Syst. Vid. Technol. 33, 8 (2023), 3899–3911.Google ScholarDigital Library
- [89] . 2003. Appearance models based on kernel canonical correlation analysis. Pattern Recog. 36, 9 (2003), 1961–1971.Google ScholarCross Ref
Index Terms
- An Optimal Edge-weighted Graph Semantic Correlation Framework for Multi-view Feature Representation Learning
Recommendations
Robust Face Recognition with Deep Multi-View Representation Learning
MM '16: Proceedings of the 24th ACM international conference on MultimediaThis paper describes our proposed method targeting at the MSR Image Recognition Challenge MS-Celeb-1M. The challenge is to recognize one million celebrities from their face images captured in the real world. The challenge provides a large scale dataset ...
Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition
Human action may be observed from multi-view, which are highly related but sometimes look different from each other. Traditional metric learning algorithms have achieved satisfactory performance in single-view, but they often fail or do not satisfy when ...
Nonnegative Constrained Graph Based Canonical Correlation Analysis for Multi-view Feature Learning
AbstractUnderstanding and analyzing multi-view data is a fundamental research topic of feature learning for a wide range of practical applications such as image classification. Canonical correlation analysis (CCA) is a popular unsupervised method of ...
Comments