-
A ResNet-101 deep learning framework induced transfer learning strategy for moving object detection Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-16 Upasana Panigrahi, Prabodh Kumar Sahoo, Manoj Kumar Panda, Ganapati Panda
Background subtraction is a crucial stage in many visual surveillance systems. The prime objective of any such system is to detect moving objects such that the system could be utilized to face many real-time challenges. In the last few decades, various methods have been developed to detect moving objects. However, the performance of many existing methods needs further improvement for slow, moderate
-
SAKD: Sparse attention knowledge distillation Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-16 Zhen Guo, Pengzhou Zhang, Peng Liang
Deep learning techniques have gained significant interest due to their success in large model scenarios. However, large models often require massive computational resources, which can challenge end devices with limited storage capabilities. Transferring knowledge from big to small models and achieving similar results with limited resources requires further research. Knowledge distillation techniques
-
A new multi-picture architecture for learned video deinterlacing and demosaicing with parallel deformable convolution and self-attention blocks Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-15 Ronglei Ji, A. Murat Tekalp
Despite the fact real-world video deinterlacing and demosaicing are well-suited to supervised learning from synthetically degraded data because the degradation models are known and fixed, learned video deinterlacing and demosaicing have received much less attention compared to denoising and super-resolution tasks. We propose a new multi-picture architecture for video deinterlacing or demosaicing by
-
Deep learning and genetic algorithm-based ensemble model for feature selection and classification of breast ultrasound images Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-10 Mohsin Furkh Dar, Avatharam Ganivada
-
WaveletFormerNet: A Transformer-based wavelet network for real-world non-homogeneous and dense fog removal Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-10 Shengli Zhang, Zhiyong Tao, Sen Lin
-
A 3D multi-scale CycleGAN framework for generating synthetic PETs from MRIs for Alzheimer's disease diagnosis Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-09 M. Khojaste-Sarakhsi, Seyedhamidreza Shahabi Haghighi, S.M.T. Fatemi Ghomi, Elena Marchiori
This paper proposes a novel framework for generating synthesized PET images from MRIs to fill in missing PETs and help with Alzheimer's disease (AD) diagnosis. This framework employs a 3D multi-scale image-to-image CycleGAN architecture for the end-to-end translation of MRI and PET domains together. A hybrid loss function is also proposed to enforce structural similarity while preserving voxel-wise
-
Mixup Mask Adaptation: Bridging the gap between input saliency and representations via attention mechanism in feature mixup Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-08 Minsoo Kang, Minkoo Kang, Seong-Whan Lee, Suhyun Kim
The inherent complexity and extensive architecture of deep neural networks often lead to overfitting, compromising their ability to generalize to new, unseen data. One of the regularization techniques, data augmentation, is now considered vital to alleviate this, and mixup, which blends pairs of images and labels, has proven effective in enhancing model generalization. Recently, incorporating saliency
-
Semantic segmentation of large-scale point clouds by integrating attention mechanisms and transformer models Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-06 Tiebiao Yuan, Yangyang Yu, Xiaolong Wang
-
Detection of dental periapical lesions using retinex based image enhancement and lightweight deep learning model Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-05 Vaishali Latke, Vaibhav Narawade
-
Alignment and fusion for adaptive domain nighttime semantic segmentation Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-03 Bao Zhang, Nianmin Yao, Jian Zhao, Yanan Zhang
In the field of autonomous driving technology, both daytime and nighttime scenes are common. However, due to the poor illumination and difficulty in manual annotation of nighttime images, semantic segmentation of nighttime scenes is more challenging compared to daytime scenes. Therefore, achieving significant progress in nighttime semantic segmentation would greatly enhance the effectiveness of the
-
Comparison of fine-tuning strategies for transfer learning in medical image classification Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-03 Ana Davila, Jacinto Colan, Yasuhisa Hasegawa
-
Robust ensemble person reidentification via orthogonal fusion with occlusion handling Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-03 Syeda Nyma Ferdous, Xin Li
Occlusion remains one of the major challenges in person reidentification (ReID) due to the diversity of poses and the variation of appearances. Developing novel architectures to improve the robustness of occlusion-aware person Re-ID requires new insights, especially on low-resolution edge cameras. We propose a deep ensemble model that harnesses both CNN and Transformer architectures to generate robust
-
Video anomaly detection based on a multi-layer reconstruction autoencoder with a variance attention strategy Image Vis. Comput. (IF 4.7) Pub Date : 2024-04-02 Shifeng Li, Yan Cheng, Liang Zhang, Xi Luo, Ruixuan Zhang
In this paper, we propose a comprehensive framework for detecting anomalies in videos based on autoencoder (AE). Traditional AE models solely rely on input and final reconstruction, potentially limiting their capacity to fully utilize the intermediate neural network layers. To mitigate this limitation, we introduce a novel approach that concurrently trains the model using corresponding intermediate
-
A three-dimensional human motion pose recognition algorithm based on graph convolutional networks Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-30 Linfang Sun, Ningning Li, Guangfeng Zhao, Gang Wang
In the task of three-dimensional human motion posture recognition, there are problems such as target loss, inaccurate target positioning, and high computational complexity. This article designs a recognition evaluation algorithm to address these issues. Design a LiteHRNet model for extracting skeleton sequences from action videos, and propose a graph convolutional structure that combines residual networks
-
A 3D motion image recognition model based on 3D CNN-GRU model and attention mechanism Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-30 Chen Cheng, Huahu Xu
Moving image recognition has become a well-explored problem in computer vision. However, it is difficult for the traditional convolutional neural network (CNN) model to effectively capture timing information in motion. For better use of video sequence features and to improve the accuracy of action recognition, Therefore, this paper proposes a Three-dimensional CNN (3DCNN) model based on Gated Recurrent
-
A spatiotemporal motion prediction network based on multi-level feature disentanglement Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-29 Suting Chen, Yewen Bo, Xu Wu
-
Siamese network to assess scanner-related contrast variability in MRI Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-29 Matteo Polsinelli, Hongwei Bran Li, Filippo Mignosi, Li Zhang, Giuseppe Placidi
Magnetic Resonance Imaging (MRI) stands as a noninvasive tool for diagnosing and monitoring various diseases. The flexibility of MRI configuration parameters allows for adaptable imaging sequences, and at the same time poses challenges in terms of reproducibility, as variability in imaging sequences leads to significant differences in image contrast. This is one of the major causes that compromise
-
Underwater image quality optimization: Researches, challenges, and future trends Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-27 Mingjie Wang, Keke Zhang, Hongan Wei, Weiling Chen, Tiesong Zhao
Underwater images serve as crucial mediums for conveying marine information. Nevertheless, due to the inherent complexity of the underwater environment, underwater images often suffer from various quality degradation phenomena such as color deviation, low contrast, and non-uniform illumination. These degraded underwater images fail to meet the requirements of underwater computer vision applications
-
Enhancing temporal action localization in an end-to-end network through estimation error incorporation Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-27 Mozhgan Mokari, Khosrow Haj Sadeghi
Temporal action localization presents a significant challenge in computer vision, as the development of an efficient method for this task remains elusive. The objective is to identify human activities within untrimmed videos, determining when and which actions occur in each video. While using trimmed videos could potentially resolve the localization problem and enhance classification accuracy, it is
-
Semantic segmentation using cross-stage feature reweighting and efficient self-attention Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-26 Yingdong Ma, Xiaobin Lan
Recently, vision transformers have demonstrated strong performance in various computer vision tasks. The success of ViTs can be attribute to the ability of capturing long-range dependencies. However, transformer-based approaches often yield segmentation maps with incomplete object structures because of restricted cross-stage information propagation and lack of low-level details. To address these problems
-
Enhancing fall prediction in the elderly people using LBP features and transfer learning model Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-26 Muhammad Umer, Aisha Ahmed Alarfaj, Ebtisam Abdullah Alabdulqader, Shtwai Alsubai, Lucia Cascone, Fabio Narducci
-
MLCapsNet +: A multi-capsule network for the identification of the HIV ISs along important sequence positions Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-22 Minakshi Boruah, Ranjita Das
The most studied sub-category of the retrovirus is the human immunodeficiency virus (HIV), which is a type of virus of the Retroviridae family. The HIV integration site (HIV IS)/ integration sites (HIV ISs) denote a crucial entity in the entire process of infection and its rebound if there is an interruption in therapy. It determines the steps involved in the formation of latent viral reserve. This
-
Enhancing open-set domain adaptation through unknown-filtering multi-classifier adversarial network Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-21 Qing Tian, Yi Zhao, Wangyuchen Wu, Jixin Sun
-
Flexible multi-objective particle swarm optimization clustering with game theory to address human activity discovery fully unsupervised Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-18 Parham Hadikhani, Daphne Teck Ching Lai, Wee-Hong Ong
-
An efficient deep learning architecture for effective fire detection in smart surveillance Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-16 Hikmat Yar, Zulfiqar Ahmad Khan, Imad Rida, Waseem Ullah, Min Je Kim, Sung Wook Baik
The threat of fire is pervasive, poses significant risks to the environment, and may include potential fatalities, property devastation, and socioeconomic disruption. Successfully mitigating these risks relies on the prompt identification of fires, a process in which soft computing methodologies play a pivotal role. Although, these fire detection methodologies neglected to explore the relationships
-
Feature disparity learning for weakly supervised object localization Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-16 Bingfeng Li, Haohao Ruan, Xinwei Li, Keping Wang
-
Model-agnostic progressive saliency map generation for object detector Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-15 Yicheng Yan, Tong Jiang, Xianfeng Li, Lianpeng Sun, Jinjun Zhu, Jianxin Lin
-
Integration of ultrasound and mammogram for multimodal classification of breast cancer using hybrid residual neural network and machine learning Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-12 Kushangi Atrey, Bikesh Kumar Singh, Narendra Kuber Bodhey
-
Authenticating and securing healthcare records: A deep learning-based zero watermarking approach Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-12 Ashima Anand, Jatin Bedi, Ashutosh Aggarwal, Muhammad Attique Khan, Imad Rida
-
Camouflaged object detection via cross-level refinement and interaction network Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-11 Yanliang Ge, Junchao Ren, Qiao Zhang, Min He, Hongbo Bi, Cong Zhang
-
RGB road scene material segmentation Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-11 Sudong Cai, Ryosuke Wakaki, Shohei Nobuhara, Ko Nishino
-
MRFormer: Multiscale retractable transformer for medical image progressive denoising via noise level estimation Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-11 Can Bai, Xianjun Han
-
Multi-axis interactive multidimensional attention network for vehicle re-identification Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-06 Xiyu Pang, Yanli Zheng, Xiushan Nie, Yilong Yin, Xi Li
Learning fine-grained discriminative information is essential to address the challenges of small inter-class differences and large intra-class differences in vehicle re-identification (Re-ID). Attentional mechanism is often used to capture important global information in images rather than fine-grained discriminative information. Studies have shown that the multi-axis interaction of information can
-
Arbitrary 3D stylization of radiance fields Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-06 Sijia Zhang, Ting Liu, Zhuoyuan Li, Yi Sun
3D Stylization that creates stylized multi-view images is quite challenging, as it requires not only generating images which align with the desired style but also maintaining consistency across different perspectives. Most previous image style transfer methods focus on the 2D image domain and stylize each view independently, suffering from multi-view inconsistency. To tackle this challenging problem
-
An improved skin lesion detection solution using multi-step preprocessing features and NASNet transfer learning model Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-05 Abdulaziz Altamimi, Fadwa Alrowais, Hanen Karamti, Muhammad Umer, Lucia Cascone, Imran Ashraf
-
Robust visual tracking via modified Harris hawks optimization Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-01 Yuqi Xiao, Yongjun Wu
-
Nonlinear circumference-based robust ellipse detection in low-SNR images Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-29 Zhuoran Wang, Jianjun Yi, Hongkai Ding, Fei Zeng, Jinzhen Mu, Bin Wu
-
An efficient feature pyramid attention network for person re-identification Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-28 Qian Luo, Jie Shao, Wanli Dang, Chao Wang, Libo Cao, Tao Zhang
For person re-identification, occlusion, appearance similarity and background clutter have always been challenges. In order to effectively address the challenges, we propose an efficient feature pyramid attention network (FPA-Net), which combines visual features from different levels to focus on both detail features and information. Specifically, we embed a pair of attention mechanisms that complement
-
Image captioning: Semantic selection unit with stacked residual attention Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-28 Lifei Song, Fei Li, Ying Wang, Yu Liu, Yuanhua Wang, Shiming Xiang
Semantic information and attention mechanism play important roles in the task of image captioning. Semantic information can strengthen the relationship between images and languages, while attention operation can steer the relevant regions spatially in the image. However, in most current works, semantic attributes are always confined to be learned from pairs of images and sentences, which ignore to
-
Improved YOLOv7 models based on modulated deformable convolution and swin transformer for object detection in fisheye images Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-28 Jie Zhou, Degang Yang, Tingting Song, Yichen Ye, Xin Zhang, Yingze Song
Thanks to the wide view field, the fisheye camera can get much more visual information. Thus, it is widely used in the field of computer vision. However, projection is often required for fisheye images to be used for object detection. Meanwhile, the projection will lead to distortion in fisheye images, and the discontinuous image edges will make the objects incomplete. Fisheye images are characterized
-
C2F: An effective coarse-to-fine network for video summarization Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-28 Ye Jin, Xiaoyan Tian, Zhao Zhang, Peng Liu, Xianglong Tang
The objective of video summarization is to develop a concise and condensed summary that accurately captures the original video content. The methods currently used to summarize supervised videos and consider the task a sequence-to-sequence problem. However, modeling the order of long videos presents three challenges: (1) capturing both local and global relationships simultaneously is challenging; (2)
-
A deep feature fusion network with global context and cross-dimensional dependencies for classification of mild cognitive impairment from brain MRI Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-27 T. Illakiya, R. Karthik, For the Alzheimer's Disease Neuroimaging Initiative
-
Multi-object tracking with adaptive measurement noise and information fusion Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-27 Xi Huang, Yinwei Zhan
-
Gaze analysis: A survey on its applications Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-26 Carmen Bisogni, Michele Nappi, Genoveffa Tortora, Alberto Del Bimbo
-
Attention guided multi-level feature aggregation network for camouflaged object detection Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-24 Anzhi Wang, Chunhong Ren, Shuang Zhao, Shibiao Mu
-
Audio-visual saliency prediction with multisensory perception and integration Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-23 Jiawei Xie, Zhi Liu, Gongyang Li, Yingjie Song
Audio-visual saliency prediction (AVSP) is a task that aims to model human attention patterns in the perception of auditory and visual scenes. Given the challenges associated with perceiving and combining multi-modal saliency features from videos, this paper presents a multi-sensory framework for AVSP. This framework is designed to extract audio, motion and image saliency features and integrate them
-
BPMB: BayesCNNs with perturbed multi-branch structure for robust facial expression recognition Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-23 Shuaishi Liu, Dongxu Zhao, Zhongbo Sun, Yuekun Chen
Wild Facial Expression Recognition (FER) task has been a long-standing challenge due to the various forms of uncertainty exist in expression data. When expression data is fed into a convolutional neural network (CNN), the model's estimated parameters also become uncertain. This uncertainty gives rise to concerns regarding the reliability of the recognition results. To quantify these uncertainties and
-
Non-probability sampling network based on anomaly pedestrian trajectory discrimination for pedestrian trajectory prediction Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-22 Quankai Liu, Haifeng Sang, Jinyu Wang, Wangxing Chen, Yulong Liu
Pedestrian trajectory prediction in first-person view is an important support for achieving fully automated driving in cities. However, existing pedestrian trajectory prediction methods still have significant shortcomings in terms of pedestrian trajectory diversity, dynamic scene constraints, and dependence on long-term trajectory prediction. We proposes a non-probability sampling network based on
-
Foreground and background separated image style transfer with a single text condition Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-21 Yue Yu, Jianming Wang, Nengli Li
Traditional image-based style transfer requires additional reference style images, making it less user-friendly. Text-based methods are more convenient but suffer from issues like slow generation, unclear content, and poor quality. In this work, we propose a new style transfer method SA2-CS (means Semantic-Aware and Salient Attention CLIPStyler), which is based on the Comparative Language Image Pretraining
-
Explicit knowledge transfer of graph-based correlation distillation and diversity data hallucination for few-shot object detection Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-21 Meng Wang, Yang Wang, Haipeng Liu
-
PTET: A progressive token exchanging transformer for infrared and visible image fusion Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-21 Jun Huang, Ziang Chen, Yong Ma, Fan Fan, Linfeng Tang, Xinyu Xiang
-
CVAD-GAN: Constrained video anomaly detection via generative adversarial network Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-19 Rituraj Singh, Anikeit Sethi, Krishanu Saini, Sumeet Saurav, Aruna Tiwari, Sanjay Singh
Automatic detection of abnormal behavior in video sequences is a fundamental and challenging problem for intelligent video surveillance systems. However, the existing state-of-the-art Video Anomaly Detection (VAD) methods are computationally expensive and lack the desired robustness in real-world scenarios. The contemporary VAD methods cannot detect the fundamental features absent during training,
-
POSER: POsed vs Spontaneous Emotion Recognition using fractal encoding Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-19 Carmen Bisogni, Lucia Cascone, Michele Nappi, Chiara Pero
-
Multiple object detection and tracking from drone videos based on GM-YOLO and multi-tracker Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-19 Yubin Yuan, Yiquan Wu, Langyue Zhao, Huixian Chen, Yao Zhang
Multiple object tracking in drone videos is a vital vision task with broad application prospects, but most trackers use spatial or appearance clues alone to correlate detections. Our proposed Multi-Tracker uses a novel similarity measure that combines position and appearance information. We designed the GM-YOLO network to provide high-quality detections as input to Multi-Tracker. Add a Coordinate Attention
-
Multi-depth branch network for efficient image super-resolution Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-18 Huiyuan Tian, Li Zhang, Shijian Li, Min Yao, Gang Pan
-
EMNet: Edge-guided multi-level network for salient object detection in low-light images Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-18 Lianghu Jing, Bo Wang
Salient object detection (SOD) has achieved remarkable performance in well-lit scenes. However, when generalized to low-light scenes, the performance of SOD shows significant decrease owing to more challenging conditions such as weak brightness, low contrast, and poor signal-to-noise ratio. To address this issue, we propose a novel edge-guided and multi-level network (EMNet) for SOD in low light images
-
ECT: Fine-grained edge detection with learned cause tokens Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-17 Shaocong Xu, Xiaoxue Chen, Yuhang Zheng, Guyue Zhou, Yurong Chen, Hongbin Zha, Hao Zhao
In this study, we tackle the challenging fine-grained edge detection task, which refers to predicting specific edges caused by reflectance, illumination, normal, and depth changes, respectively. Prior methods exploit multi-scale convolutional networks, which are limited in three aspects: (1) Convolutions are operators while identifying the cause of edge formation requires looking at far away pixels
-
Feature decoupling and interaction network for defending against adversarial examples Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-17 Weidong Wang, Zhi Li, Shuaiwei Liu, Li Zhang, Jin Yang, Yi Wang
-
Integrating prior knowledge into a bibranch pyramid network for medical image segmentation Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-17 Xianjun Han, Tiantian Li, Can Bai, Hongyu Yang
Medical image segmentation is crucial for obtaining accurate diagnoses, and while convolutional neural network (CNN)-based methods have made strides in recent years, they struggle with modeling long-range dependencies. Transformer-based methods improve this task but require more computational resources. The segment anything model (SAM) can generate pixel-level segmentation results for natural images
-
Gated contextual transformer network for multi-modal retinal image clinical description generation Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-15 Nagur Shareef Shaik, Teja Krishna Cherukuri
Generating semantically meaningful and coherent clinical description for the diagnosis of retinal images has been a challenging task for both Computer Vision and Natural Language Processing domains. This is mainly due to the fact that the clinical descriptions generated by the language model are completely dependent on the type of retinal image representations learned by the vision model. This work