Computer Vision and Image Understanding期刊最新论文, 计算机, 应用类期刊,

Multi-view cognition with path search for one-shot part labeling

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-04-16
Shaowei Wang, Lingling Zhang, Tao Qin, Jun Liu, Yifei Li, Qianying Wang, Qinghua Zheng

The diagram is an abstract form of visual expression in the field of education, which is often used to express complex phenomena and convey logic relationships. In recent years, tasks such as diagram classification and textbook question answering have attracted attention and become a new benchmark for evaluating the complex reasoning ability of models. However, due to the lack of large corpora and

更新日期：2024-04-16

详情收藏

TFUT: Task fusion upward transformer model for multi-task learning on dense prediction

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-04-16
Zewei Xin, Shalayiding Sirejiding, Yuxiang Lu, Yue Ding, Chunlin Wang, Tamam Alsarhan, Hongtao Lu

Transformer-based advancements have shown great promise in solving multi-task learning on dense prediction tasks. Well-designed task interaction modules of these methods further improve the performances by effectively transferring contextual information between tasks. However, many of these methods do not leverage the target task to guide contextual information from the source task. We propose the

更新日期：2024-04-16

详情收藏

Static graph convolution with learned temporal and channel-wise graph topology generation for skeleton-based action recognition

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-04-15
Chuankun Li, Shuai Li, Yanbo Gao, Lijuan Zhou, Wanqing Li

Graph convolutional networks (GCNs) are widely used in skeleton-based action recognition. It is known that the graph topology is a vital part in GCNs, and different kinds of graph topologies have been proposed for skeleton-based action recognition, mostly based on a predefined topology and a dynamically learned one. The predefined topology is based on the human intuition for skeleton (the connectivity

更新日期：2024-04-15

详情收藏

CTM: Cross-time temporal module for fine-grained action recognition

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-04-09
Huifang Qian, Jialun Zhang, Jianping Yi, Zhenyu Shi, Yimin Zhang

Dynamic contextual attribute information in the time dimension is the key to fine-grained action recognition. Temporal contextual relationships cannot be captured by conventional 2D CNNs; good local time can be obtained by the 3D CNNs, but the 3D CNNs are computationally intensive and lack capability for global time. A parallel cross-time temporal module-CTM is proposed in this article, which aims

更新日期：2024-04-09

详情收藏

Recognizing facial expressions based on pyramid multi-head grid and spatial attention network

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-04-05
Jianyang Zhang, Wei Wang, Xiangyu Li, Yanjiang Han

Facial Expression Recognition (FER) is garnered considerable interest in the field of computer vision. Being a challenging task, it faces some key problems such as inter-class similarity, intra-class variability, and environment sensitivity. Typically, the traditional Convolutional Neural Networks (CNN) are limited by their locality and thus have difficulty learning long-range dependencies between

更新日期：2024-04-05

详情收藏

Tensor robust PCA with nonconvex and nonlocal regularization

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-27
Xiaoyu Geng, Qiang Guo, Shuaixiong Hui, Ming Yang, Caiming Zhang

Tensor robust principal component analysis (TRPCA) is a classical way for low-rank tensor recovery, which minimizes the convex surrogate of tensor rank by shrinking each tensor singular value equally. However, for real-world visual data, large singular values represent more significant information than small singular values. In this paper, we propose a nonconvex TRPCA (N-TRPCA) model based on the tensor

更新日期：2024-03-27

详情收藏

A novel camera calibration method based on known rotations and translations

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-27
Zhangfei Chen, Xuelong Si, Dan Wu, Fengnian Tian, Zhenxing Zheng, Renfu Li

It is often difficult to align the camera optical center with the rotation center of the motion mechanism during active vision calibration, and this dis-alignment could lead to inaccuracy of the constraint equations and calibration results. To circumvent such issues, this paper proposes a novel method for active vision camera calibration and facilitates its real-world implementation. In this method

更新日期：2024-03-27

详情收藏

SlowFastFormer for 3D human pose estimation

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-24
Lu Zhou, Yingying Chen, Jinqiao Wang

3D human pose estimation in videos aims at locating the human joints in the 3D space given a temporal sequence. Motion information and skeleton context are two significant elements for pose estimation in videos. In this paper, we propose a SlowFastFormer (slow-fast transformer) network where two branches with different input rates are composed to encode these two different kinds of context. For the

更新日期：2024-03-24

详情收藏

Multi-guided-based image matting via boundary detection

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-19
Guilin Yao, Anming Sun

Existing automatic matting methods tend to directly obtain the alpha mattes from the RGB image using semantic segmentation networks, and relying solely on segmentation to achieve high-quality estimation is usually unrealistic. To address this issue, we propose a multi-guided-based image matting (MGBMatting) model that utilizes boundary information and semantic features as comprehensive and sufficient

更新日期：2024-03-19

详情收藏

The 2023 video similarity dataset and challenge

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-19
Ed Pizzi, Giorgos Kordopatis-Zilos, Hiral Patel, Gheorghe Postelnicu, Sugosh Nagavara Ravindra, Akshay Gupta, Symeon Papadopoulos, Giorgos Tolias, Matthijs Douze

This work introduces a dataset, benchmark, and challenge for the problem of video copy tracing. There are two related tasks: determining whether a query video shares content with a reference video (“detection”) and temporally localizing the shared content within each video (“localization”). The benchmark is designed to evaluate methods on these two tasks. It simulates a realistic needle-in-haystack

更新日期：2024-03-19

详情收藏

GaitSCM: Causal representation learning for gait recognition

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-19
Wei Huo, Ke Wang, Jun Tang, Nian Wang, Dong Liang

Gait recognition is a promising biometric technology that aims to identify the target subject via walking pattern. Most existing appearance-based methods focus on learning discriminative spatio-temporal representations from gait silhouettes. However, these methods pay less attention to probing the causality between identity factors and identity labels, which often mislead the model to learn gait representations

更新日期：2024-03-19

详情收藏

Scribble-based complementary graph reasoning network for weakly supervised salient object detection

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-16
Shuang Liang, Zhiqi Yan, Chi Xie, Hongming Zhu, Jiewen Wang

Current salient object detection (SOD) methods rely heavily on accurate pixel-level annotations. To reduce the annotation workload, some scribble-based methods have emerged. Recent works address the sparse scribble annotations by introducing auxiliary information and enhancing local features. However, the impact of long-range dependence between pixels on energy propagation and model performance has

更新日期：2024-03-16

详情收藏

Learning single and multi-scene camera pose regression with transformer encoders

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-15
Yoli Shavit, Ron Ferens, Yosi Keller

Contemporary state-of-the-art localization methods perform feature matching against a structured scene model or learn to regress the scene 3D coordinates. The resulting matches between 2D query pixels and 3D scene coordinates are used to estimate the camera pose using PnP and RANSAC, requiring the camera intrinsics for both the query and reference images. An alternative approach is to directly regress

更新日期：2024-03-15

详情收藏

Cascade transformers with dynamic attention for video question answering

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-15
Yimin Jiang, Tingfei Yan, Mingze Yao, Huibing Wang, Wenzhe Liu

Visual question answering (VQA) has become a hot study topic with challenging motivation of correctly answering the videos or images questions in recent years. However, the existing VQA model mostly aimed at answering questions about images and performed poorly in the video question answering (VideoQA) domain. VideoQA needs to simultaneously consider the correlations between video frames and the dynamic

更新日期：2024-03-15

详情收藏

Exploiting multimodal synthetic data for egocentric human-object interaction detection in an industrial scenario

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-13
Rosario Leonardi, Francesco Ragusa, Antonino Furnari, Giovanni Maria Farinella

In this paper, we tackle the problem of Egocentric Human-Object Interaction (EHOI) detection in an industrial setting. To overcome the lack of public datasets in this context, we propose a pipeline and a tool for generating synthetic images of EHOIs paired with several annotations and data signals (e.g., depth maps or segmentation masks). Using the proposed pipeline, we present a new multimodal dataset

更新日期：2024-03-13

详情收藏

End-to-end dense video grounding via parallel regression

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-07
Fengyuan Shi, Weilin Huang, Limin Wang

Video grounding aims to localize the corresponding moment in an untrimmed video given a sentence description. Existing methods often address this task in an indirect “one-to-many” way, i.e., predicting more than one proposal for one sentence description, by casting it as a propose-and-match or fusion-and-detection problem. Solving these surrogate problems often requires sophisticated label assignment

更新日期：2024-03-07

详情收藏

FAM: Improving columnar vision transformer with feature attention mechanism

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-06
Lan Huang, Xingyu Bai, Jia Zeng, Mengqiang Yu, Wei Pang, Kangping Wang

Vision Transformer has garnered outstanding performance in visual tasks due to its capability for global modeling of image information. However, during the self-attention computation of image tokens, a common issue of attention map homogenization arises, impacting the final performance of the model as attention maps propagate through feature maps layer by layer. In this research, we propose a token-based

更新日期：2024-03-06

详情收藏

Background no more: Action recognition across domains by causal interventions

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-28
Sarah Rastegar, Hazel Doughty, Cees G.M. Snoek

We aim to recognize actions under an appearance distribution shift between a source training domain and a target test domain. To enable such video domain generalization, our key idea is to intervene on the action to remove the confounding effect of the domain background on the class label using causal inference. Towards this, we propose to learn a causally debiased model on a source domain that intervenes

更新日期：2024-02-28

详情收藏

Simple contrastive learning in a self-supervised manner for robust visual question answering

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-27
Shuwen Yang, Luwei Xiao, Xingjiao Wu, Junjie Xu, Linlin Wang, Liang He

Recent observations have revealed that Visual Question Answering models are susceptible to learning the spurious correlations formed by dataset biases, i.e., the language priors, instead of the intended solution. For instance, given a question and a relative image, some VQA systems are prone to provide the frequently occurring answer in the dataset while disregarding the image content. Such a preferred

更新日期：2024-02-27

详情收藏

Learning key lines for multi-object tracking

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-20
Yi-Fan Li, Hong-Bing Ji, Xi Chen, Yong-Liang Yang, Yu-Kun Lai

Most online multi-object tracking methods utilize bounding boxes and center points inherited from detectors as the base models to represent targets. Limited performance is obtained with these base models alone for tracking. Complex networks are generally applied on top to extract high-level discriminative features such as appearance embeddings and motion predictions for data association. However, the

更新日期：2024-02-20

详情收藏

SpATr: MoCap 3D human action recognition based on spiral auto-encoder and transformer network

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-17
Hamza Bouzid, Lahoucine Ballihi

Recent technological advancements have significantly expanded the potential of human action recognition through harnessing the power of 3D data. This data provides a richer understanding of actions, including depth information that enables more accurate analysis of spatial and temporal characteristics. In this context, We study the challenge of 3D human action recognition. Unlike prior methods, that

更新日期：2024-02-17

详情收藏

Combinational sign language recognition

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-17
Liqing Gao, Wei Feng, Fan Lyu, Liang Wan

Traditional Sign Language Recognition (SLR) suffers from the scale limitation of SL datasets, which may lead to over-fitting in narrow context and application. In this paper, to solve the problem, we for the first time propose a Combinational Sign Language Recognition (CombSLR) framework, which can serve as an augmentation to extend existing datasets by combining continuous videos (called Template)

更新日期：2024-02-17

详情收藏

Editorial Board

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-16

更新日期：2024-02-16

详情收藏

MAEDAY: MAE for few- and zero-shot AnomalY-Detection

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-16
Eli Schwartz, Assaf Arbelle, Leonid Karlinsky, Sivan Harary, Florian Scheidegger, Sivan Doveh, Raja Giryes

We propose using Masked Auto-Encoder (MAE), a transformer model self-supervisedly trained on image inpainting, for anomaly detection (AD). Assuming anomalous regions are harder to reconstruct compared with normal regions. MAEDAY is the first image-reconstruction-based anomaly detection method that utilizes a pre-trained model, enabling its use for Few-Shot Anomaly Detection (FSAD). We also show the

更新日期：2024-02-16

详情收藏

Quantifying model uncertainty for semantic segmentation of Fluorine-19 MRI using stochastic gradient MCMC

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-16
Masoumeh Javanbakhat, Ludger Starke, Sonia Waiczies, Christoph Lippert

Fluorine-19 (F) MRI is an emerging theranostic tool for studying diseases and treatments simultaneously, particularly in challenging neuroinflammatory conditions. However, the low signal-to-noise ratio (SNR) of F MRI necessitates computational methods to reliably detect F signal regions and segment these from the background. In this study, we demonstrate that Bayesian fully convolutional neural networks

更新日期：2024-02-16

详情收藏

Human-Scene Network: A novel baseline with self-rectifying loss for weakly supervised video anomaly detection

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-15
Snehashis Majhi, Rui Dai, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, François Brémond

Video anomaly detection in surveillance systems with only video-level labels () is challenging. This is due to (i) the complex integration of a large variety of scenarios including human and scene-based anomalies characterized by subtle or sharp spatio-temporal cues in real-world videos and (ii) non-optimal optimization between normal and anomaly instances under weak supervision. In this paper, we

更新日期：2024-02-15

详情收藏

Exploring using jigsaw puzzles for out-of-distribution detection

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-15
Yeonguk Yu, Sungho Shin, Minhwan Ko, Kyoobin Lee

Out-of-distribution (OOD) detection involves binary classification whether the given data is from outside the training data or not. Previous studies proposed outlier exposure (OE) that trains the model on an outlier dataset designed to represent potential future OOD data, thereby enhancing OOD detection performance. However, obtaining an outlier dataset representing all possible future OOD data can

更新日期：2024-02-15

详情收藏

Domain generalized federated learning for Person Re-identification

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-15
Fangyi Liu, Mang Ye, Bo Du

In the field of Person Re-identification (ReID), addressing the demands of practical applications in diverse and uncontrollable unseen domains necessitates a focus on Domain Generalization (DG). However, when tackling DG for human-related tasks, the growing awareness of privacy introduces new challenges. Privacy concerns often prevent the sharing of local datasets for global learning, and this limitation

更新日期：2024-02-15

详情收藏

Survey on fast dense video segmentation techniques

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-14
Quentin Monnier, Tania Pouli, Kidiyo Kpalma

Semantic segmentation aims at classifying image pixels according to given categories. Deep learning approaches have proven to be very effective for this task. However, extensions to video content are more challenging, typically requiring more complex architectures, given the temporal constraints and the additional data that video introduces. At the same time, video application tend to necessitate real-time

更新日期：2024-02-14

详情收藏

Rethink arbitrary style transfer with transformer and contrastive learning

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-10
Zhanjie Zhang, Jiakai Sun, Guangyuan Li, Lei Zhao, Quanwei Zhang, Zehua Lan, Haolin Yin, Wei Xing, Huaizhong Lin, Zhiwen Zuo

Arbitrary style transfer holds widespread attention in research and boasts numerous practical applications. The existing methods, which either employ cross-attention to incorporate deep style attributes into content attributes or use adaptive normalization to adjust content features, fail to generate high-quality stylized images. In this paper, we introduce an innovative technique to improve the quality

更新日期：2024-02-10

详情收藏

Transformer-based assignment decision network for multiple object tracking

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-09
Athena Psalta, Vasileios Tsironis, Konstantinos Karantzalos

Data association is a crucial component for any multiple object tracking (MOT) method that follows the tracking-by-detection paradigm. To generate complete trajectories such methods employ a data association process to establish assignments between detections and existing targets during each timestep. Recent data association approaches try to solve either a multi-dimensional linear assignment task

更新日期：2024-02-09

详情收藏

Re-scoring using image-language similarity for few-shot object detection

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-08
Min Jae Jung, Seung Dae Han, Joohee Kim

Few-shot object detection, which focuses on detecting novel objects with few labels, is an emerging challenge in the community. Recent studies show that adapting a pre-trained model or modified loss function can improve performance. In this paper, we explore leveraging the power of Contrastive Language-Image Pre-training (CLIP) and hard negative classification loss in low data setting. Specifically

更新日期：2024-02-08

详情收藏

Video Frame-wise Explanation Driven Contrastive Learning for Procedural Text Generation

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-08
Zhihao Wang, Lin Li, Zhongwei Xie, Chuanbo Liu

Procedural text generation from visual observation of instructional videos, such as assembling, biochemical experiments, and cooking, is an essential task for scene understanding and real-world applications. The major difference from general captioning tasks is two-fold: it has a flow of material combination in instructional steps, and the materials change their state through action-involved manipulations

更新日期：2024-02-08

详情收藏

Attention-based multimodal image matching

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-07
Aviad Moreshet, Yosi Keller

We propose a method for matching multimodal image patches using a multiscale Transformer-Encoder that focuses on the feature maps of a Siamese CNN. It effectively combines multiscale image embeddings while improving task-specific and appearance-invariant image cues. We also introduce a residual attention architecture that allows for end-to-end training by using a residual connection. To the best of

更新日期：2024-02-07

详情收藏

Self-supervised multi-scale semantic consistency regularization for unsupervised image-to-image translation

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-07
Heng Zhang, Yi-Jun Yang, Wei Zeng

Unsupervised image-to-image translation aims to learn a domain mapping function that preserves the semantics of an input image while adapting its style to target domains without paired data. However, if there is a large semantic mismatch between the source and target domains, current methods often suffer from semantics distortion. Based on dense self-supervised representation learning, a novel Multi-Scale

更新日期：2024-02-07

详情收藏

Simplifying open-set video domain adaptation with contrastive learning

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-07
Giacomo Zara, Victor Guilherme Turrisi da Costa, Subhankar Roy, Paolo Rota, Elisa Ricci

In an effort to reduce annotation costs in action recognition, unsupervised video domain adaptation methods have been proposed that aim to adapt a predictive model from a labelled dataset (i.e., source domain) to an unlabelled dataset (i.e., target domain). In this work we address a more realistic scenario, called open-set video domain adaptation (OUVDA), where the target dataset contains “unknown”

更新日期：2024-02-07

详情收藏

Revisiting coarse-to-fine strategy for low-light image enhancement with deep decomposition guided training

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-06
Hai Jiang, Yang Ren, Songchen Han

Previous coarse-to-fine strategies typically spend equal effort in feature extraction and feature reconstruction, and gradually improve the brightness of images from bottom to top, resulting in computational resources not being well consumed for restoration. In this paper, we propose a new deep framework for Robust and Fast Low-Light Image Enhancement, dubbed RFLLIE. Specifically, we first use a lightweight

更新日期：2024-02-06

详情收藏

GMC: A general framework of multi-stage context learning and utilization for visual detection tasks

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-05
Xuan Wang, Hao Tang, Zhigang Zhu

Various contextual information has been employed by many approaches for visual detection tasks. However, most of the existing approaches only focus on specific context for specific tasks. In this paper, GMC, a general framework is proposed for multistage context learning and utilization, with various deep network architectures for various visual detection tasks. The GMC framework encompasses three

更新日期：2024-02-05

详情收藏

Towards efficient image and video style transfer via distillation and learnable feature transformation

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-02
Jing Huo, Meihao Kong, Wenbin Li, Jing Wu, Yu-Kun Lai, Yang Gao

Despite the recent rapid development of neural style transfer, existing style transfer methods are still somewhat inefficient or have a large model size, which limits their application on computational resource limited devices. The major problem lies in that they usually adopt a pre-trained VGG-19 backbone which is relatively large or the feature transformation module is computationally heavy. To address

更新日期：2024-02-02

详情收藏

Enhancing video anomaly detection with learnable memory network: A new approach to memory-based auto-encoders

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-01
Zhiqiang Wang, Xiaojing Gu, Xingsheng Gu, Jingyu Hu

The aim of video anomaly detection is to detect anomalous events in a video sequence. In an unsupervised setting, enhancing detection accuracy hinges on the ability to learn normal features during the training phase and subsequently generate large errors when abnormal video frames are encountered during the testing phase. The transformer is an innovative neural network that utilizes a self-attention

更新日期：2024-02-01

详情收藏

Deep parametric Retinex decomposition model for low-light image enhancement

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-01
Xiaofang Li, Weiwei Wang, Xiangchu Feng, Min Li

Images captured under low light conditions often suffer from various degradations. The Retinex models are highly effective in enhancing low-light images. The analytical optimization models are interpretable but inflexible to various scenes. The data-driven learning models are flexible to various scenes but less interpretable. To reconcile the advantages of both, we propose a parametric Retinex model

更新日期：2024-02-01

详情收藏

CPRNC: Channels pruning via reverse neuron crowding for model compression

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-23
Pingfan Wu, Hengyi Huang, Han Sun, Dong Liang, Ningzhong Liu

Channel pruning is an efficient technique for model compression, removing redundant parts of a convolutional neural network with minor degradation in classification accuracy. Previous criteria of channel pruning ignore neurons’ intrinsic relationship and the high correlation with input samples. Inspired by the visual crowding phenomenon in neuroscience, this paper presents a novel channel pruning method

更新日期：2024-01-25

详情收藏

Hierarchical compositional representations for few-shot action recognition

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-13
Changzhen Li, Jie Zhang, Shuzhe Wu, Xin Jin, Shiguang Shan

Recently action recognition has received more and more attention for its comprehensive and practical applications in intelligent surveillance and human–computer interaction. However, few-shot action recognition has not been well explored and remains challenging because of data scarcity. In this paper, we propose a novel hierarchical compositional representations (HCR) learning approach for few-shot

更新日期：2024-01-18

详情收藏

On the coherency of quantitative evaluation of visual explanations

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-18
Benjamin Vandersmissen, José Oramas

Recent years have shown an increased development of methods for justifying the predictions of neural networks through visual explanations. These explanations usually take the form of heatmaps which assign a saliency (or relevance) value to each pixel of the input image that expresses how relevant the pixel is for the prediction of a label. Complementing this development, evaluation methods have been

更新日期：2024-01-18

详情收藏

Semantic-aware Transformer for shadow detection

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-18
Kai Zhou, Jing-Long Fang, Wen Wu, Yan-Li Shao, Xing-Qi Wang, Dan Wei

Shadow detection is significant for scene understanding. Ambiguities in a shadow image, such as shadow-like non-shadow regions and shadow regions with non-shadow patterns, are still very challenging for prevalent CNN-based methods. This work attempts to alleviate this problem from a new perspective of shape semantics, and then proposes a Semantic-aware Transformer (SaT) in a multi-task learning manner

更新日期：2024-01-18

详情收藏

A novel slime mold algorithm for grayscale and color image contrast enhancement

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-16
Guoyuan Ma, Xiaofeng Yue, Juan Zhu, Zeyuan Liu, Zongheng Zhang, Yuan Zhou, Chang Li

Image enhancement is a key step in image pre-processing. To address the problem of low quality and visual effect of images under low illumination conditions, this paper proposes an image enhancement method with hyperbolic oscillation factor and quadratic interpolation of slime mold algorithm (SSMA) in non-complete beta function dynamically looking to adjust the grayscale curve. The new strategy mainly

更新日期：2024-01-17

详情收藏

GradPaint: Gradient-guided inpainting with diffusion models

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-17
Asya Grechka, Guillaume Couairon, Matthieu Cord

Denoising Diffusion Probabilistic Models (DDPMs) have recently achieved remarkable results in conditional and unconditional image generation. The pre-trained models can be adapted without further training to different downstream tasks, by guiding their iterative denoising process at inference time to satisfy additional constraints. For the specific task of image inpainting, the current guiding mechanism

更新日期：2024-01-17

详情收藏

Fourier analysis on robustness of graph convolutional neural networks for skeleton-based action recognition

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-11
Nariki Tanaka, Hiroshi Kera, Kazuhiko Kawamoto

Using Fourier analysis, we explore the robustness and vulnerability of graph convolutional neural networks (GCNs) for skeleton-based action recognition. We adopt a joint Fourier transform (JFT), a combination of the graph Fourier transform (GFT) and the discrete Fourier transform (DFT), to examine the robustness of adversarially-trained GCNs against adversarial attacks and common corruptions. Experimental

更新日期：2024-01-16

详情收藏

Enhancing image-based facial expression recognition through muscle activation-based facial feature extraction

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-11
Manuel A. Solis-Arrazola, Raul E. Sanchez-Yañez, Carlos H. Garcia-Capulin, Horacio Rostro-Gonzalez

This article introduces a non-intrusive method to estimate facial muscle activity from images, diverging from conventional electrode-based approaches. Our methodology capitalizes on an inclusive set of features encompassing a diverse range of facial muscles, often overlooked in research, thus significantly expanding the scope of analyzing muscle activity within facial expressions. Our method is based

更新日期：2024-01-16

详情收藏

Editorial Board

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-16

更新日期：2024-01-16

详情收藏

PPformer: Using pixel-wise and patch-wise cross-attention for low-light image enhancement

Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-15
Jiachen Dang, Yong Zhong, Xiaolin Qin

Recently, transformer-based methods have shown strong competition compared to CNN-based methods on the low-light image enhancement task, by employing the self-attention for feature extraction. Transformer-based methods perform well in modeling long-range pixel dependencies, which are essential for low-light image enhancement to achieve better lighting, natural colors, and higher contrast. However,

更新日期：2024-01-15

详情收藏