Image and Vision Computing期刊最新论文, 计算机, 工程类期刊,

A ResNet-101 deep learning framework induced transfer learning strategy for moving object detection

Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-16
Upasana Panigrahi, Prabodh Kumar Sahoo, Manoj Kumar Panda, Ganapati Panda

Background subtraction is a crucial stage in many visual surveillance systems. The prime objective of any such system is to detect moving objects such that the system could be utilized to face many real-time challenges. In the last few decades, various methods have been developed to detect moving objects. However, the performance of many existing methods needs further improvement for slow, moderate

更新日期：2024-04-16

详情收藏

SAKD: Sparse attention knowledge distillation

Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-16
Zhen Guo, Pengzhou Zhang, Peng Liang

Deep learning techniques have gained significant interest due to their success in large model scenarios. However, large models often require massive computational resources, which can challenge end devices with limited storage capabilities. Transferring knowledge from big to small models and achieving similar results with limited resources requires further research. Knowledge distillation techniques

更新日期：2024-04-16

详情收藏

A new multi-picture architecture for learned video deinterlacing and demosaicing with parallel deformable convolution and self-attention blocks

Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-15
Ronglei Ji, A. Murat Tekalp

Despite the fact real-world video deinterlacing and demosaicing are well-suited to supervised learning from synthetically degraded data because the degradation models are known and fixed, learned video deinterlacing and demosaicing have received much less attention compared to denoising and super-resolution tasks. We propose a new multi-picture architecture for video deinterlacing or demosaicing by

更新日期：2024-04-15

详情收藏

A 3D multi-scale CycleGAN framework for generating synthetic PETs from MRIs for Alzheimer's disease diagnosis

Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-09
M. Khojaste-Sarakhsi, Seyedhamidreza Shahabi Haghighi, S.M.T. Fatemi Ghomi, Elena Marchiori

This paper proposes a novel framework for generating synthesized PET images from MRIs to fill in missing PETs and help with Alzheimer's disease (AD) diagnosis. This framework employs a 3D multi-scale image-to-image CycleGAN architecture for the end-to-end translation of MRI and PET domains together. A hybrid loss function is also proposed to enforce structural similarity while preserving voxel-wise

更新日期：2024-04-09

详情收藏

Mixup Mask Adaptation: Bridging the gap between input saliency and representations via attention mechanism in feature mixup

Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-08
Minsoo Kang, Minkoo Kang, Seong-Whan Lee, Suhyun Kim

The inherent complexity and extensive architecture of deep neural networks often lead to overfitting, compromising their ability to generalize to new, unseen data. One of the regularization techniques, data augmentation, is now considered vital to alleviate this, and mixup, which blends pairs of images and labels, has proven effective in enhancing model generalization. Recently, incorporating saliency

更新日期：2024-04-08

详情收藏

Alignment and fusion for adaptive domain nighttime semantic segmentation

Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-03
Bao Zhang, Nianmin Yao, Jian Zhao, Yanan Zhang

In the field of autonomous driving technology, both daytime and nighttime scenes are common. However, due to the poor illumination and difficulty in manual annotation of nighttime images, semantic segmentation of nighttime scenes is more challenging compared to daytime scenes. Therefore, achieving significant progress in nighttime semantic segmentation would greatly enhance the effectiveness of the

更新日期：2024-04-03

详情收藏

Robust ensemble person reidentification via orthogonal fusion with occlusion handling

Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-03
Syeda Nyma Ferdous, Xin Li

Occlusion remains one of the major challenges in person reidentification (ReID) due to the diversity of poses and the variation of appearances. Developing novel architectures to improve the robustness of occlusion-aware person Re-ID requires new insights, especially on low-resolution edge cameras. We propose a deep ensemble model that harnesses both CNN and Transformer architectures to generate robust

更新日期：2024-04-03

详情收藏

Video anomaly detection based on a multi-layer reconstruction autoencoder with a variance attention strategy

Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-02
Shifeng Li, Yan Cheng, Liang Zhang, Xi Luo, Ruixuan Zhang

In this paper, we propose a comprehensive framework for detecting anomalies in videos based on autoencoder (AE). Traditional AE models solely rely on input and final reconstruction, potentially limiting their capacity to fully utilize the intermediate neural network layers. To mitigate this limitation, we introduce a novel approach that concurrently trains the model using corresponding intermediate

更新日期：2024-04-02

详情收藏

A three-dimensional human motion pose recognition algorithm based on graph convolutional networks

Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-30
Linfang Sun, Ningning Li, Guangfeng Zhao, Gang Wang

In the task of three-dimensional human motion posture recognition, there are problems such as target loss, inaccurate target positioning, and high computational complexity. This article designs a recognition evaluation algorithm to address these issues. Design a LiteHRNet model for extracting skeleton sequences from action videos, and propose a graph convolutional structure that combines residual networks

更新日期：2024-03-30

详情收藏

A 3D motion image recognition model based on 3D CNN-GRU model and attention mechanism

Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-30
Chen Cheng, Huahu Xu

Moving image recognition has become a well-explored problem in computer vision. However, it is difficult for the traditional convolutional neural network (CNN) model to effectively capture timing information in motion. For better use of video sequence features and to improve the accuracy of action recognition, Therefore, this paper proposes a Three-dimensional CNN (3DCNN) model based on Gated Recurrent

更新日期：2024-03-30

详情收藏

A spatiotemporal motion prediction network based on multi-level feature disentanglement

Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-29
Suting Chen, Yewen Bo, Xu Wu

更新日期：2024-03-29

详情收藏

Siamese network to assess scanner-related contrast variability in MRI

Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-29
Matteo Polsinelli, Hongwei Bran Li, Filippo Mignosi, Li Zhang, Giuseppe Placidi

Magnetic Resonance Imaging (MRI) stands as a noninvasive tool for diagnosing and monitoring various diseases. The flexibility of MRI configuration parameters allows for adaptable imaging sequences, and at the same time poses challenges in terms of reproducibility, as variability in imaging sequences leads to significant differences in image contrast. This is one of the major causes that compromise

更新日期：2024-03-29

详情收藏

Underwater image quality optimization: Researches, challenges, and future trends

Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-27
Mingjie Wang, Keke Zhang, Hongan Wei, Weiling Chen, Tiesong Zhao

Underwater images serve as crucial mediums for conveying marine information. Nevertheless, due to the inherent complexity of the underwater environment, underwater images often suffer from various quality degradation phenomena such as color deviation, low contrast, and non-uniform illumination. These degraded underwater images fail to meet the requirements of underwater computer vision applications

更新日期：2024-03-27

详情收藏

Enhancing temporal action localization in an end-to-end network through estimation error incorporation

Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-27
Mozhgan Mokari, Khosrow Haj Sadeghi

Temporal action localization presents a significant challenge in computer vision, as the development of an efficient method for this task remains elusive. The objective is to identify human activities within untrimmed videos, determining when and which actions occur in each video. While using trimmed videos could potentially resolve the localization problem and enhance classification accuracy, it is

更新日期：2024-03-27

详情收藏

Semantic segmentation using cross-stage feature reweighting and efficient self-attention

Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-26
Yingdong Ma, Xiaobin Lan

Recently, vision transformers have demonstrated strong performance in various computer vision tasks. The success of ViTs can be attribute to the ability of capturing long-range dependencies. However, transformer-based approaches often yield segmentation maps with incomplete object structures because of restricted cross-stage information propagation and lack of low-level details. To address these problems

更新日期：2024-03-26

详情收藏

MLCapsNet +: A multi-capsule network for the identification of the HIV ISs along important sequence positions

Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-22
Minakshi Boruah, Ranjita Das

The most studied sub-category of the retrovirus is the human immunodeficiency virus (HIV), which is a type of virus of the Retroviridae family. The HIV integration site (HIV IS)/ integration sites (HIV ISs) denote a crucial entity in the entire process of infection and its rebound if there is an interruption in therapy. It determines the steps involved in the formation of latent viral reserve. This

更新日期：2024-03-22

详情收藏

An efficient deep learning architecture for effective fire detection in smart surveillance

Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-16
Hikmat Yar, Zulfiqar Ahmad Khan, Imad Rida, Waseem Ullah, Min Je Kim, Sung Wook Baik

The threat of fire is pervasive, poses significant risks to the environment, and may include potential fatalities, property devastation, and socioeconomic disruption. Successfully mitigating these risks relies on the prompt identification of fires, a process in which soft computing methodologies play a pivotal role. Although, these fire detection methodologies neglected to explore the relationships

更新日期：2024-03-16

详情收藏

Multi-axis interactive multidimensional attention network for vehicle re-identification

Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-06
Xiyu Pang, Yanli Zheng, Xiushan Nie, Yilong Yin, Xi Li

Learning fine-grained discriminative information is essential to address the challenges of small inter-class differences and large intra-class differences in vehicle re-identification (Re-ID). Attentional mechanism is often used to capture important global information in images rather than fine-grained discriminative information. Studies have shown that the multi-axis interaction of information can

更新日期：2024-03-06

详情收藏

Arbitrary 3D stylization of radiance fields

Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-06
Sijia Zhang, Ting Liu, Zhuoyuan Li, Yi Sun

3D Stylization that creates stylized multi-view images is quite challenging, as it requires not only generating images which align with the desired style but also maintaining consistency across different perspectives. Most previous image style transfer methods focus on the 2D image domain and stylize each view independently, suffering from multi-view inconsistency. To tackle this challenging problem

更新日期：2024-03-06

详情收藏

An efficient feature pyramid attention network for person re-identification

Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-28
Qian Luo, Jie Shao, Wanli Dang, Chao Wang, Libo Cao, Tao Zhang

For person re-identification, occlusion, appearance similarity and background clutter have always been challenges. In order to effectively address the challenges, we propose an efficient feature pyramid attention network (FPA-Net), which combines visual features from different levels to focus on both detail features and information. Specifically, we embed a pair of attention mechanisms that complement

更新日期：2024-02-28

详情收藏

Image captioning: Semantic selection unit with stacked residual attention

Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-28
Lifei Song, Fei Li, Ying Wang, Yu Liu, Yuanhua Wang, Shiming Xiang

Semantic information and attention mechanism play important roles in the task of image captioning. Semantic information can strengthen the relationship between images and languages, while attention operation can steer the relevant regions spatially in the image. However, in most current works, semantic attributes are always confined to be learned from pairs of images and sentences, which ignore to

更新日期：2024-02-28

详情收藏

Improved YOLOv7 models based on modulated deformable convolution and swin transformer for object detection in fisheye images

Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-28
Jie Zhou, Degang Yang, Tingting Song, Yichen Ye, Xin Zhang, Yingze Song

Thanks to the wide view field, the fisheye camera can get much more visual information. Thus, it is widely used in the field of computer vision. However, projection is often required for fisheye images to be used for object detection. Meanwhile, the projection will lead to distortion in fisheye images, and the discontinuous image edges will make the objects incomplete. Fisheye images are characterized

更新日期：2024-02-28

详情收藏

C2F: An effective coarse-to-fine network for video summarization

Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-28
Ye Jin, Xiaoyan Tian, Zhao Zhang, Peng Liu, Xianglong Tang

The objective of video summarization is to develop a concise and condensed summary that accurately captures the original video content. The methods currently used to summarize supervised videos and consider the task a sequence-to-sequence problem. However, modeling the order of long videos presents three challenges: (1) capturing both local and global relationships simultaneously is challenging; (2)

更新日期：2024-02-28

详情收藏

A deep feature fusion network with global context and cross-dimensional dependencies for classification of mild cognitive impairment from brain MRI

Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-27
T. Illakiya, R. Karthik, For the Alzheimer's Disease Neuroimaging Initiative

更新日期：2024-02-27

详情收藏

Audio-visual saliency prediction with multisensory perception and integration

Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-23
Jiawei Xie, Zhi Liu, Gongyang Li, Yingjie Song

Audio-visual saliency prediction (AVSP) is a task that aims to model human attention patterns in the perception of auditory and visual scenes. Given the challenges associated with perceiving and combining multi-modal saliency features from videos, this paper presents a multi-sensory framework for AVSP. This framework is designed to extract audio, motion and image saliency features and integrate them

更新日期：2024-02-23

详情收藏

BPMB: BayesCNNs with perturbed multi-branch structure for robust facial expression recognition

Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-23
Shuaishi Liu, Dongxu Zhao, Zhongbo Sun, Yuekun Chen

Wild Facial Expression Recognition (FER) task has been a long-standing challenge due to the various forms of uncertainty exist in expression data. When expression data is fed into a convolutional neural network (CNN), the model's estimated parameters also become uncertain. This uncertainty gives rise to concerns regarding the reliability of the recognition results. To quantify these uncertainties and

更新日期：2024-02-23

详情收藏

Non-probability sampling network based on anomaly pedestrian trajectory discrimination for pedestrian trajectory prediction

Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-22
Quankai Liu, Haifeng Sang, Jinyu Wang, Wangxing Chen, Yulong Liu

Pedestrian trajectory prediction in first-person view is an important support for achieving fully automated driving in cities. However, existing pedestrian trajectory prediction methods still have significant shortcomings in terms of pedestrian trajectory diversity, dynamic scene constraints, and dependence on long-term trajectory prediction. We proposes a non-probability sampling network based on

更新日期：2024-02-22

详情收藏

Foreground and background separated image style transfer with a single text condition

Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-21
Yue Yu, Jianming Wang, Nengli Li

Traditional image-based style transfer requires additional reference style images, making it less user-friendly. Text-based methods are more convenient but suffer from issues like slow generation, unclear content, and poor quality. In this work, we propose a new style transfer method SA2-CS (means Semantic-Aware and Salient Attention CLIPStyler), which is based on the Comparative Language Image Pretraining

更新日期：2024-02-21

详情收藏

Explicit knowledge transfer of graph-based correlation distillation and diversity data hallucination for few-shot object detection

Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-21
Meng Wang, Yang Wang, Haipeng Liu

更新日期：2024-02-21

详情收藏

CVAD-GAN: Constrained video anomaly detection via generative adversarial network

Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-19
Rituraj Singh, Anikeit Sethi, Krishanu Saini, Sumeet Saurav, Aruna Tiwari, Sanjay Singh

Automatic detection of abnormal behavior in video sequences is a fundamental and challenging problem for intelligent video surveillance systems. However, the existing state-of-the-art Video Anomaly Detection (VAD) methods are computationally expensive and lack the desired robustness in real-world scenarios. The contemporary VAD methods cannot detect the fundamental features absent during training,

更新日期：2024-02-19

详情收藏

Multiple object detection and tracking from drone videos based on GM-YOLO and multi-tracker

Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-19
Yubin Yuan, Yiquan Wu, Langyue Zhao, Huixian Chen, Yao Zhang

Multiple object tracking in drone videos is a vital vision task with broad application prospects, but most trackers use spatial or appearance clues alone to correlate detections. Our proposed Multi-Tracker uses a novel similarity measure that combines position and appearance information. We designed the GM-YOLO network to provide high-quality detections as input to Multi-Tracker. Add a Coordinate Attention

更新日期：2024-02-19

详情收藏

EMNet: Edge-guided multi-level network for salient object detection in low-light images

Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-18
Lianghu Jing, Bo Wang

Salient object detection (SOD) has achieved remarkable performance in well-lit scenes. However, when generalized to low-light scenes, the performance of SOD shows significant decrease owing to more challenging conditions such as weak brightness, low contrast, and poor signal-to-noise ratio. To address this issue, we propose a novel edge-guided and multi-level network (EMNet) for SOD in low light images

更新日期：2024-02-18

详情收藏

ECT: Fine-grained edge detection with learned cause tokens

Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-17
Shaocong Xu, Xiaoxue Chen, Yuhang Zheng, Guyue Zhou, Yurong Chen, Hongbin Zha, Hao Zhao

In this study, we tackle the challenging fine-grained edge detection task, which refers to predicting specific edges caused by reflectance, illumination, normal, and depth changes, respectively. Prior methods exploit multi-scale convolutional networks, which are limited in three aspects: (1) Convolutions are operators while identifying the cause of edge formation requires looking at far away pixels

更新日期：2024-02-17

详情收藏

Integrating prior knowledge into a bibranch pyramid network for medical image segmentation

Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-17
Xianjun Han, Tiantian Li, Can Bai, Hongyu Yang

Medical image segmentation is crucial for obtaining accurate diagnoses, and while convolutional neural network (CNN)-based methods have made strides in recent years, they struggle with modeling long-range dependencies. Transformer-based methods improve this task but require more computational resources. The segment anything model (SAM) can generate pixel-level segmentation results for natural images

更新日期：2024-02-17

详情收藏

Gated contextual transformer network for multi-modal retinal image clinical description generation

Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-15
Nagur Shareef Shaik, Teja Krishna Cherukuri

Generating semantically meaningful and coherent clinical description for the diagnosis of retinal images has been a challenging task for both Computer Vision and Natural Language Processing domains. This is mainly due to the fact that the clinical descriptions generated by the language model are completely dependent on the type of retinal image representations learned by the vision model. This work

更新日期：2024-02-15

详情收藏