样式: 排序: IF: - GO 导出 标记为已读
-
Multi-view cognition with path search for one-shot part labeling Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-04-16 Shaowei Wang, Lingling Zhang, Tao Qin, Jun Liu, Yifei Li, Qianying Wang, Qinghua Zheng
The diagram is an abstract form of visual expression in the field of education, which is often used to express complex phenomena and convey logic relationships. In recent years, tasks such as diagram classification and textbook question answering have attracted attention and become a new benchmark for evaluating the complex reasoning ability of models. However, due to the lack of large corpora and
-
TFUT: Task fusion upward transformer model for multi-task learning on dense prediction Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-04-16 Zewei Xin, Shalayiding Sirejiding, Yuxiang Lu, Yue Ding, Chunlin Wang, Tamam Alsarhan, Hongtao Lu
Transformer-based advancements have shown great promise in solving multi-task learning on dense prediction tasks. Well-designed task interaction modules of these methods further improve the performances by effectively transferring contextual information between tasks. However, many of these methods do not leverage the target task to guide contextual information from the source task. We propose the
-
Static graph convolution with learned temporal and channel-wise graph topology generation for skeleton-based action recognition Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-04-15 Chuankun Li, Shuai Li, Yanbo Gao, Lijuan Zhou, Wanqing Li
Graph convolutional networks (GCNs) are widely used in skeleton-based action recognition. It is known that the graph topology is a vital part in GCNs, and different kinds of graph topologies have been proposed for skeleton-based action recognition, mostly based on a predefined topology and a dynamically learned one. The predefined topology is based on the human intuition for skeleton (the connectivity
-
FusionDiff: A unified image fusion network based on diffusion probabilistic models Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-04-10 Zefeng Huang, Shen Yang, Jin Wu, Lei Zhu, Jin Liu
-
CTM: Cross-time temporal module for fine-grained action recognition Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-04-09 Huifang Qian, Jialun Zhang, Jianping Yi, Zhenyu Shi, Yimin Zhang
Dynamic contextual attribute information in the time dimension is the key to fine-grained action recognition. Temporal contextual relationships cannot be captured by conventional 2D CNNs; good local time can be obtained by the 3D CNNs, but the 3D CNNs are computationally intensive and lack capability for global time. A parallel cross-time temporal module-CTM is proposed in this article, which aims
-
Recognizing facial expressions based on pyramid multi-head grid and spatial attention network Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-04-05 Jianyang Zhang, Wei Wang, Xiangyu Li, Yanjiang Han
Facial Expression Recognition (FER) is garnered considerable interest in the field of computer vision. Being a challenging task, it faces some key problems such as inter-class similarity, intra-class variability, and environment sensitivity. Typically, the traditional Convolutional Neural Networks (CNN) are limited by their locality and thus have difficulty learning long-range dependencies between
-
Head pose estimation with uncertainty and an application to dyadic interaction detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-28 Federico Figari Tomenotti, Nicoletta Noceti, Francesca Odone
-
Evolutionary Search via channel attention based parameter inheritance and stochastic uniform sampled training Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-28 Yugang Liao, Junqing Li, Shuwei Wei, Xiumei Xiao
-
Tensor robust PCA with nonconvex and nonlocal regularization Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-27 Xiaoyu Geng, Qiang Guo, Shuaixiong Hui, Ming Yang, Caiming Zhang
Tensor robust principal component analysis (TRPCA) is a classical way for low-rank tensor recovery, which minimizes the convex surrogate of tensor rank by shrinking each tensor singular value equally. However, for real-world visual data, large singular values represent more significant information than small singular values. In this paper, we propose a nonconvex TRPCA (N-TRPCA) model based on the tensor
-
A novel camera calibration method based on known rotations and translations Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-27 Zhangfei Chen, Xuelong Si, Dan Wu, Fengnian Tian, Zhenxing Zheng, Renfu Li
It is often difficult to align the camera optical center with the rotation center of the motion mechanism during active vision calibration, and this dis-alignment could lead to inaccuracy of the constraint equations and calibration results. To circumvent such issues, this paper proposes a novel method for active vision camera calibration and facilitates its real-world implementation. In this method
-
SlowFastFormer for 3D human pose estimation Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-24 Lu Zhou, Yingying Chen, Jinqiao Wang
3D human pose estimation in videos aims at locating the human joints in the 3D space given a temporal sequence. Motion information and skeleton context are two significant elements for pose estimation in videos. In this paper, we propose a SlowFastFormer (slow-fast transformer) network where two branches with different input rates are composed to encode these two different kinds of context. For the
-
Multi-guided-based image matting via boundary detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-19 Guilin Yao, Anming Sun
Existing automatic matting methods tend to directly obtain the alpha mattes from the RGB image using semantic segmentation networks, and relying solely on segmentation to achieve high-quality estimation is usually unrealistic. To address this issue, we propose a multi-guided-based image matting (MGBMatting) model that utilizes boundary information and semantic features as comprehensive and sufficient
-
The 2023 video similarity dataset and challenge Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-19 Ed Pizzi, Giorgos Kordopatis-Zilos, Hiral Patel, Gheorghe Postelnicu, Sugosh Nagavara Ravindra, Akshay Gupta, Symeon Papadopoulos, Giorgos Tolias, Matthijs Douze
This work introduces a dataset, benchmark, and challenge for the problem of video copy tracing. There are two related tasks: determining whether a query video shares content with a reference video (“detection”) and temporally localizing the shared content within each video (“localization”). The benchmark is designed to evaluate methods on these two tasks. It simulates a realistic needle-in-haystack
-
GaitSCM: Causal representation learning for gait recognition Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-19 Wei Huo, Ke Wang, Jun Tang, Nian Wang, Dong Liang
Gait recognition is a promising biometric technology that aims to identify the target subject via walking pattern. Most existing appearance-based methods focus on learning discriminative spatio-temporal representations from gait silhouettes. However, these methods pay less attention to probing the causality between identity factors and identity labels, which often mislead the model to learn gait representations
-
Visual tracking in camera-switching outdoor sport videos: Benchmark and baselines for skiing Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-19 Matteo Dunnhofer, Christian Micheloni
-
Domain-aware triplet loss in domain generalization Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-16 Kaiyu Guo, Brian C. Lovell
-
Scribble-based complementary graph reasoning network for weakly supervised salient object detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-16 Shuang Liang, Zhiqi Yan, Chi Xie, Hongming Zhu, Jiewen Wang
Current salient object detection (SOD) methods rely heavily on accurate pixel-level annotations. To reduce the annotation workload, some scribble-based methods have emerged. Recent works address the sparse scribble annotations by introducing auxiliary information and enhancing local features. However, the impact of long-range dependence between pixels on energy propagation and model performance has
-
Learning single and multi-scene camera pose regression with transformer encoders Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-15 Yoli Shavit, Ron Ferens, Yosi Keller
Contemporary state-of-the-art localization methods perform feature matching against a structured scene model or learn to regress the scene 3D coordinates. The resulting matches between 2D query pixels and 3D scene coordinates are used to estimate the camera pose using PnP and RANSAC, requiring the camera intrinsics for both the query and reference images. An alternative approach is to directly regress
-
Cascade transformers with dynamic attention for video question answering Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-15 Yimin Jiang, Tingfei Yan, Mingze Yao, Huibing Wang, Wenzhe Liu
Visual question answering (VQA) has become a hot study topic with challenging motivation of correctly answering the videos or images questions in recent years. However, the existing VQA model mostly aimed at answering questions about images and performed poorly in the video question answering (VideoQA) domain. VideoQA needs to simultaneously consider the correlations between video frames and the dynamic
-
Exploiting multimodal synthetic data for egocentric human-object interaction detection in an industrial scenario Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-13 Rosario Leonardi, Francesco Ragusa, Antonino Furnari, Giovanni Maria Farinella
In this paper, we tackle the problem of Egocentric Human-Object Interaction (EHOI) detection in an industrial setting. To overcome the lack of public datasets in this context, we propose a pipeline and a tool for generating synthetic images of EHOIs paired with several annotations and data signals (e.g., depth maps or segmentation masks). Using the proposed pipeline, we present a new multimodal dataset
-
End-to-end dense video grounding via parallel regression Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-07 Fengyuan Shi, Weilin Huang, Limin Wang
Video grounding aims to localize the corresponding moment in an untrimmed video given a sentence description. Existing methods often address this task in an indirect “one-to-many” way, i.e., predicting more than one proposal for one sentence description, by casting it as a propose-and-match or fusion-and-detection problem. Solving these surrogate problems often requires sophisticated label assignment
-
FAM: Improving columnar vision transformer with feature attention mechanism Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-06 Lan Huang, Xingyu Bai, Jia Zeng, Mengqiang Yu, Wei Pang, Kangping Wang
Vision Transformer has garnered outstanding performance in visual tasks due to its capability for global modeling of image information. However, during the self-attention computation of image tokens, a common issue of attention map homogenization arises, impacting the final performance of the model as attention maps propagate through feature maps layer by layer. In this research, we propose a token-based
-
Background no more: Action recognition across domains by causal interventions Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-28 Sarah Rastegar, Hazel Doughty, Cees G.M. Snoek
We aim to recognize actions under an appearance distribution shift between a source training domain and a target test domain. To enable such video domain generalization, our key idea is to intervene on the action to remove the confounding effect of the domain background on the class label using causal inference. Towards this, we propose to learn a causally debiased model on a source domain that intervenes
-
Simple contrastive learning in a self-supervised manner for robust visual question answering Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-27 Shuwen Yang, Luwei Xiao, Xingjiao Wu, Junjie Xu, Linlin Wang, Liang He
Recent observations have revealed that Visual Question Answering models are susceptible to learning the spurious correlations formed by dataset biases, i.e., the language priors, instead of the intended solution. For instance, given a question and a relative image, some VQA systems are prone to provide the frequently occurring answer in the dataset while disregarding the image content. Such a preferred
-
Learning key lines for multi-object tracking Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-20 Yi-Fan Li, Hong-Bing Ji, Xi Chen, Yong-Liang Yang, Yu-Kun Lai
Most online multi-object tracking methods utilize bounding boxes and center points inherited from detectors as the base models to represent targets. Limited performance is obtained with these base models alone for tracking. Complex networks are generally applied on top to extract high-level discriminative features such as appearance embeddings and motion predictions for data association. However, the
-
SpATr: MoCap 3D human action recognition based on spiral auto-encoder and transformer network Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-17 Hamza Bouzid, Lahoucine Ballihi
Recent technological advancements have significantly expanded the potential of human action recognition through harnessing the power of 3D data. This data provides a richer understanding of actions, including depth information that enables more accurate analysis of spatial and temporal characteristics. In this context, We study the challenge of 3D human action recognition. Unlike prior methods, that
-
Combinational sign language recognition Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-17 Liqing Gao, Wei Feng, Fan Lyu, Liang Wan
Traditional Sign Language Recognition (SLR) suffers from the scale limitation of SL datasets, which may lead to over-fitting in narrow context and application. In this paper, to solve the problem, we for the first time propose a Combinational Sign Language Recognition (CombSLR) framework, which can serve as an augmentation to extend existing datasets by combining continuous videos (called Template)
-
-
MAEDAY: MAE for few- and zero-shot AnomalY-Detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-16 Eli Schwartz, Assaf Arbelle, Leonid Karlinsky, Sivan Harary, Florian Scheidegger, Sivan Doveh, Raja Giryes
We propose using Masked Auto-Encoder (MAE), a transformer model self-supervisedly trained on image inpainting, for anomaly detection (AD). Assuming anomalous regions are harder to reconstruct compared with normal regions. MAEDAY is the first image-reconstruction-based anomaly detection method that utilizes a pre-trained model, enabling its use for Few-Shot Anomaly Detection (FSAD). We also show the
-
Quantifying model uncertainty for semantic segmentation of Fluorine-19 MRI using stochastic gradient MCMC Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-16 Masoumeh Javanbakhat, Ludger Starke, Sonia Waiczies, Christoph Lippert
Fluorine-19 (F) MRI is an emerging theranostic tool for studying diseases and treatments simultaneously, particularly in challenging neuroinflammatory conditions. However, the low signal-to-noise ratio (SNR) of F MRI necessitates computational methods to reliably detect F signal regions and segment these from the background. In this study, we demonstrate that Bayesian fully convolutional neural networks
-
Human-Scene Network: A novel baseline with self-rectifying loss for weakly supervised video anomaly detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-15 Snehashis Majhi, Rui Dai, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, François Brémond
Video anomaly detection in surveillance systems with only video-level labels () is challenging. This is due to (i) the complex integration of a large variety of scenarios including human and scene-based anomalies characterized by subtle or sharp spatio-temporal cues in real-world videos and (ii) non-optimal optimization between normal and anomaly instances under weak supervision. In this paper, we
-
Exploring using jigsaw puzzles for out-of-distribution detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-15 Yeonguk Yu, Sungho Shin, Minhwan Ko, Kyoobin Lee
Out-of-distribution (OOD) detection involves binary classification whether the given data is from outside the training data or not. Previous studies proposed outlier exposure (OE) that trains the model on an outlier dataset designed to represent potential future OOD data, thereby enhancing OOD detection performance. However, obtaining an outlier dataset representing all possible future OOD data can
-
Domain generalized federated learning for Person Re-identification Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-15 Fangyi Liu, Mang Ye, Bo Du
In the field of Person Re-identification (ReID), addressing the demands of practical applications in diverse and uncontrollable unseen domains necessitates a focus on Domain Generalization (DG). However, when tackling DG for human-related tasks, the growing awareness of privacy introduces new challenges. Privacy concerns often prevent the sharing of local datasets for global learning, and this limitation
-
Survey on fast dense video segmentation techniques Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-14 Quentin Monnier, Tania Pouli, Kidiyo Kpalma
Semantic segmentation aims at classifying image pixels according to given categories. Deep learning approaches have proven to be very effective for this task. However, extensions to video content are more challenging, typically requiring more complex architectures, given the temporal constraints and the additional data that video introduces. At the same time, video application tend to necessitate real-time
-
Rethink arbitrary style transfer with transformer and contrastive learning Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-10 Zhanjie Zhang, Jiakai Sun, Guangyuan Li, Lei Zhao, Quanwei Zhang, Zehua Lan, Haolin Yin, Wei Xing, Huaizhong Lin, Zhiwen Zuo
Arbitrary style transfer holds widespread attention in research and boasts numerous practical applications. The existing methods, which either employ cross-attention to incorporate deep style attributes into content attributes or use adaptive normalization to adjust content features, fail to generate high-quality stylized images. In this paper, we introduce an innovative technique to improve the quality
-
Transformer-based assignment decision network for multiple object tracking Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-09 Athena Psalta, Vasileios Tsironis, Konstantinos Karantzalos
Data association is a crucial component for any multiple object tracking (MOT) method that follows the tracking-by-detection paradigm. To generate complete trajectories such methods employ a data association process to establish assignments between detections and existing targets during each timestep. Recent data association approaches try to solve either a multi-dimensional linear assignment task
-
Re-scoring using image-language similarity for few-shot object detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-08 Min Jae Jung, Seung Dae Han, Joohee Kim
Few-shot object detection, which focuses on detecting novel objects with few labels, is an emerging challenge in the community. Recent studies show that adapting a pre-trained model or modified loss function can improve performance. In this paper, we explore leveraging the power of Contrastive Language-Image Pre-training (CLIP) and hard negative classification loss in low data setting. Specifically
-
Video Frame-wise Explanation Driven Contrastive Learning for Procedural Text Generation Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-08 Zhihao Wang, Lin Li, Zhongwei Xie, Chuanbo Liu
Procedural text generation from visual observation of instructional videos, such as assembling, biochemical experiments, and cooking, is an essential task for scene understanding and real-world applications. The major difference from general captioning tasks is two-fold: it has a flow of material combination in instructional steps, and the materials change their state through action-involved manipulations
-
Attention-based multimodal image matching Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-07 Aviad Moreshet, Yosi Keller
We propose a method for matching multimodal image patches using a multiscale Transformer-Encoder that focuses on the feature maps of a Siamese CNN. It effectively combines multiscale image embeddings while improving task-specific and appearance-invariant image cues. We also introduce a residual attention architecture that allows for end-to-end training by using a residual connection. To the best of
-
Self-supervised multi-scale semantic consistency regularization for unsupervised image-to-image translation Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-07 Heng Zhang, Yi-Jun Yang, Wei Zeng
Unsupervised image-to-image translation aims to learn a domain mapping function that preserves the semantics of an input image while adapting its style to target domains without paired data. However, if there is a large semantic mismatch between the source and target domains, current methods often suffer from semantics distortion. Based on dense self-supervised representation learning, a novel Multi-Scale
-
Simplifying open-set video domain adaptation with contrastive learning Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-07 Giacomo Zara, Victor Guilherme Turrisi da Costa, Subhankar Roy, Paolo Rota, Elisa Ricci
In an effort to reduce annotation costs in action recognition, unsupervised video domain adaptation methods have been proposed that aim to adapt a predictive model from a labelled dataset (i.e., source domain) to an unlabelled dataset (i.e., target domain). In this work we address a more realistic scenario, called open-set video domain adaptation (OUVDA), where the target dataset contains “unknown”
-
Revisiting coarse-to-fine strategy for low-light image enhancement with deep decomposition guided training Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-06 Hai Jiang, Yang Ren, Songchen Han
Previous coarse-to-fine strategies typically spend equal effort in feature extraction and feature reconstruction, and gradually improve the brightness of images from bottom to top, resulting in computational resources not being well consumed for restoration. In this paper, we propose a new deep framework for Robust and Fast Low-Light Image Enhancement, dubbed RFLLIE. Specifically, we first use a lightweight
-
GMC: A general framework of multi-stage context learning and utilization for visual detection tasks Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-05 Xuan Wang, Hao Tang, Zhigang Zhu
Various contextual information has been employed by many approaches for visual detection tasks. However, most of the existing approaches only focus on specific context for specific tasks. In this paper, GMC, a general framework is proposed for multistage context learning and utilization, with various deep network architectures for various visual detection tasks. The GMC framework encompasses three
-
Towards efficient image and video style transfer via distillation and learnable feature transformation Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-02 Jing Huo, Meihao Kong, Wenbin Li, Jing Wu, Yu-Kun Lai, Yang Gao
Despite the recent rapid development of neural style transfer, existing style transfer methods are still somewhat inefficient or have a large model size, which limits their application on computational resource limited devices. The major problem lies in that they usually adopt a pre-trained VGG-19 backbone which is relatively large or the feature transformation module is computationally heavy. To address
-
Enhancing video anomaly detection with learnable memory network: A new approach to memory-based auto-encoders Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-01 Zhiqiang Wang, Xiaojing Gu, Xingsheng Gu, Jingyu Hu
The aim of video anomaly detection is to detect anomalous events in a video sequence. In an unsupervised setting, enhancing detection accuracy hinges on the ability to learn normal features during the training phase and subsequently generate large errors when abnormal video frames are encountered during the testing phase. The transformer is an innovative neural network that utilizes a self-attention
-
Deep parametric Retinex decomposition model for low-light image enhancement Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-01 Xiaofang Li, Weiwei Wang, Xiangchu Feng, Min Li
Images captured under low light conditions often suffer from various degradations. The Retinex models are highly effective in enhancing low-light images. The analytical optimization models are interpretable but inflexible to various scenes. The data-driven learning models are flexible to various scenes but less interpretable. To reconcile the advantages of both, we propose a parametric Retinex model
-
MERLIN-Seg: Self-supervised despeckling for label-efficient semantic segmentation Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-01 Emanuele Dalsasso, Clément Rambour, Nicolas Trouvé, Nicolas Thome
-
Space–time recurrent memory network Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-30 Hung Nguyen, Chanho Kim, Fuxin Li
-
Temporal adaptive feature pyramid network for action detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-24 Xuezhi Xiang, Hang Yin, Yulong Qiao, Abdulmotaleb El Saddik
-
CPRNC: Channels pruning via reverse neuron crowding for model compression Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-23 Pingfan Wu, Hengyi Huang, Han Sun, Dong Liang, Ningzhong Liu
Channel pruning is an efficient technique for model compression, removing redundant parts of a convolutional neural network with minor degradation in classification accuracy. Previous criteria of channel pruning ignore neurons’ intrinsic relationship and the high correlation with input samples. Inspired by the visual crowding phenomenon in neuroscience, this paper presents a novel channel pruning method
-
Unsupervised deep learning of foreground objects from low-rank and sparse dataset Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-18 Keita Takeda, Tomoya Sakai
-
Hierarchical compositional representations for few-shot action recognition Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-13 Changzhen Li, Jie Zhang, Shuzhe Wu, Xin Jin, Shiguang Shan
Recently action recognition has received more and more attention for its comprehensive and practical applications in intelligent surveillance and human–computer interaction. However, few-shot action recognition has not been well explored and remains challenging because of data scarcity. In this paper, we propose a novel hierarchical compositional representations (HCR) learning approach for few-shot
-
On the coherency of quantitative evaluation of visual explanations Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-18 Benjamin Vandersmissen, José Oramas
Recent years have shown an increased development of methods for justifying the predictions of neural networks through visual explanations. These explanations usually take the form of heatmaps which assign a saliency (or relevance) value to each pixel of the input image that expresses how relevant the pixel is for the prediction of a label. Complementing this development, evaluation methods have been
-
Semantic-aware Transformer for shadow detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-18 Kai Zhou, Jing-Long Fang, Wen Wu, Yan-Li Shao, Xing-Qi Wang, Dan Wei
Shadow detection is significant for scene understanding. Ambiguities in a shadow image, such as shadow-like non-shadow regions and shadow regions with non-shadow patterns, are still very challenging for prevalent CNN-based methods. This work attempts to alleviate this problem from a new perspective of shape semantics, and then proposes a Semantic-aware Transformer (SaT) in a multi-task learning manner
-
A novel slime mold algorithm for grayscale and color image contrast enhancement Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-16 Guoyuan Ma, Xiaofeng Yue, Juan Zhu, Zeyuan Liu, Zongheng Zhang, Yuan Zhou, Chang Li
Image enhancement is a key step in image pre-processing. To address the problem of low quality and visual effect of images under low illumination conditions, this paper proposes an image enhancement method with hyperbolic oscillation factor and quadratic interpolation of slime mold algorithm (SSMA) in non-complete beta function dynamically looking to adjust the grayscale curve. The new strategy mainly
-
GradPaint: Gradient-guided inpainting with diffusion models Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-17 Asya Grechka, Guillaume Couairon, Matthieu Cord
Denoising Diffusion Probabilistic Models (DDPMs) have recently achieved remarkable results in conditional and unconditional image generation. The pre-trained models can be adapted without further training to different downstream tasks, by guiding their iterative denoising process at inference time to satisfy additional constraints. For the specific task of image inpainting, the current guiding mechanism
-
Fourier analysis on robustness of graph convolutional neural networks for skeleton-based action recognition Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-11 Nariki Tanaka, Hiroshi Kera, Kazuhiko Kawamoto
Using Fourier analysis, we explore the robustness and vulnerability of graph convolutional neural networks (GCNs) for skeleton-based action recognition. We adopt a joint Fourier transform (JFT), a combination of the graph Fourier transform (GFT) and the discrete Fourier transform (DFT), to examine the robustness of adversarially-trained GCNs against adversarial attacks and common corruptions. Experimental
-
Enhancing image-based facial expression recognition through muscle activation-based facial feature extraction Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-11 Manuel A. Solis-Arrazola, Raul E. Sanchez-Yañez, Carlos H. Garcia-Capulin, Horacio Rostro-Gonzalez
This article introduces a non-intrusive method to estimate facial muscle activity from images, diverging from conventional electrode-based approaches. Our methodology capitalizes on an inclusive set of features encompassing a diverse range of facial muscles, often overlooked in research, thus significantly expanding the scope of analyzing muscle activity within facial expressions. Our method is based
-
-
PPformer: Using pixel-wise and patch-wise cross-attention for low-light image enhancement Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-15 Jiachen Dang, Yong Zhong, Xiaolin Qin
Recently, transformer-based methods have shown strong competition compared to CNN-based methods on the low-light image enhancement task, by employing the self-attention for feature extraction. Transformer-based methods perform well in modeling long-range pixel dependencies, which are essential for low-light image enhancement to achieve better lighting, natural colors, and higher contrast. However,