样式: 排序: IF: - GO 导出 标记为已读
-
Multi-grained Representation Aggregating Transformer with Gating Cycle for Change Captioning ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-22 Shengbin Yue, Yunbin Tu, Liang Li, Shengxiang Gao, Zhengtao Yu
Change captioning aims to describe the difference within an image pair in natural language, which combines visual comprehension and language generation. Although significant progress has been achieved, it remains a key challenge of perceiving the object change from different perspectives, especially the severe situation with drastic viewpoint change. In this paper, we propose a novel full-attentive
-
C2: ABR Streaming in Cognizant of Consumption Context for Improved QoE and Resource Usage Tradeoffs ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-18 Cheonjin Park, Chinmaey Shende, Subhabrata Sen, Bing Wang
Smartphones have emerged as ubiquitous platforms for people to consume content in a wide range of consumption contexts (C2), e.g., over cellular or WiFi, playing back audio and video directly on phone or through peripheral devices such as external screens or speakers. In this paper, we argue that a user’s specific C2 is an important factor to consider in Adaptive Bitrate (ABR) streaming. We examine
-
Seventeen Years of the ACM Transactions on Multimedia Computing, Communications and Applications: A Bibliometric Overview ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-18 Walayat Hussain, Honghao Gao, Rafiul Karim, Abdulmotaleb El Saddik
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) has been dedicated to advancing multimedia research, fostering discoveries, innovations, and practical applications since 2005. The journal consistently publishes top-notch, original research in emerging fields through open submissions, calls for papers, special issues, rigorous review processes, and diverse research
-
Rank-based Hashing for Effective and Efficient Nearest Neighbor Search for Image Retrieval ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-16 Vinicius Sato Kawai, Lucas Pascotti Valem, Alexandro Baldassin, Edson Borin, Daniel Carlos Guimarães Pedronette, Longin Jan Latecki
The large and growing amount of digital data creates a pressing need for approaches capable of indexing and retrieving multimedia content. A traditional and fundamental challenge consists of effectively and efficiently performing nearest-neighbor searches. After decades of research, several different methods are available, including trees, hashing, and graph-based approaches. Most of the current methods
-
A Quality-Aware and Obfuscation-Based Data Collection Scheme for Cyber-Physical Metaverse Systems ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-16 Jianheng Tang, Kejia Fan, Wenjie Yin, Shihao Yang, Yajiang Huang, Anfeng Liu, Neal N. Xiong, Mianxiong Dong, Tian Wang, Shaobo Zhang
In pursuit of an immersive virtual experience within the Cyber-Physical Metaverse Systems (CPMS), the construction of Avatars often requires a significant amount of real-world data. Mobile Crowd Sensing (MCS) has emerged as an efficient method for collecting data for CPMS. While progress has been made in protecting the privacy of workers, little attention has been given to safeguarding task privacy
-
Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-11 Zhewei Tu, Xiangbo Shu, Peng Huang, Rui Yan, Zhenxing Liu, Jiachao Zhang
Semi-supervised action recognition is a challenging yet prospective task due to its low reliance on costly labeled videos. One high-profile solution is to explore frame-level weak/strong augmentations for learning abundant representations, inspired by the FixMatch framework dominating the semi-supervised image classification task. However, such a solution mainly brings perturbations in terms of texture
-
GANs in the Panorama of Synthetic Data Generation Methods: Application and Evaluation: Enhancing Fake News Detection with GAN-Generated Synthetic Data: ACM Transactions on Multimedia Computing, Communications, and Applications: Vol 0, No ja ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-10 Bruno Vaz, Álvaro Figueira
This paper focuses on the creation and evaluation of synthetic data to address the challenges of imbalanced datasets in machine learning applications (ML), using fake news detection as a case study. We conducted a thorough literature review on generative adversarial networks (GANs) for tabular data, synthetic data generation methods, and synthetic data quality assessment. By augmenting a public news
-
SigFormer: Sparse Signal-Guided Transformer for Multi-Modal Action Segmentation ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-10 Qi Liu, Xinchen Liu, Kun Liu, Xiaoyan Gu, Wu Liu
Multi-modal human action segmentation is a critical and challenging task with a wide range of applications. Nowadays, the majority of approaches concentrate on the fusion of dense signals (i.e., RGB, optical flow, and depth maps). However, the potential contributions of sparse IoT sensor signals, which can be crucial for achieving accurate recognition, have not been fully explored. To make up for this
-
DBGAN: Dual Branch Generative Adversarial Network for Multi-modal MRI Translation ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-10 Jun Lyu, Shouang Yan, M. Shamim Hossain
Existing Magnetic resonance imaging (MRI) translation models rely on Generative Adversarial Networks, primarily employing simple convolutional neural networks. Unfortunately, these networks struggle to capture global representations and contextual relationships within MRI images. While the advent of Transformers enables capturing long-range feature dependencies, they often compromise the preservation
-
Psychology-Guided Environment Aware Network for Discovering Social Interaction Groups from Videos ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-09 Jiaqi Yu, Jinhai Yang, Hua Yang, Renjie Pan, Pingrui Lai, Guangtao Zhai
Social interaction is a common phenomenon in human societies. Different from discovering groups based on the similarity of individuals’ actions, social interaction focuses more on the mutual influence between people. Although people can easily judge whether or not there are social interactions in a real-world scene, it is difficult for an intelligent system to discover social interactions. Initiating
-
RSUIGM: Realistic Synthetic Underwater Image Generation with Image Formation Model ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-08 Chaitra Desai, Sujay Benur, Ujwala Patil, Uma Mudenagudi
In this paper, we propose to synthesize realistic underwater images with a novel image formation model, considering both downwelling depth and line of sight (LOS) distance as cue and call it as Realistic Synthetic Underwater Image Generation Model, RSUIGM. The light interaction in the ocean is a complex process and demands specific modeling of direct and backscattering phenomenon to capture the degradations
-
High Fidelity Makeup via 2D and 3D Identity Preservation Net ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-08 Jinliang Liu, Zhedong Zheng, Zongxin Yang, Yi Yang
In this paper, we address the challenging makeup transfer task, aiming to transfer makeup from a reference image to a source image while preserving facial geometry and background consistency. Existing deep neural network-based methods have shown promising results in aligning facial parts and transferring makeup textures. However, they often neglect the facial geometry of the source image, leading to
-
A Self-Defense Copyright Protection Scheme for NFT Image Art Based on Information Embedding ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-06 Fan Wang, Zhangjie Fu, Xiang Zhang
Non-convertible tokens (NFTs) have become a fundamental part of the metaverse ecosystem due to its uniqueness and immutability. However, existing copyright protection schemes of NFT image art relied on the NFTs itself minted by third-party platforms. A minted NFT image art only tracks and verifies the entire transaction process, but the legitimacy of the source and ownership of its mapped digital image
-
Facial soft-biometrics obfuscation through adversarial attacks ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-06 Vincenzo Carletti, Pasquale Foggia, Antonio Greco, Alessia Saggese, Mario Vento
Sharing facial pictures through online services, especially on social networks, has become a common habit for thousands of users. This practice hides a possible threat to privacy: the owners of such services, as well as malicious users, could automatically extract information from faces using modern and effective neural networks. In this paper, we propose the harmless use of adversarial attacks, i
-
MEDUSA: A Dynamic Codec Switching Approach in HTTP Adaptive Streaming ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-05 Daniele Lorenzi, Farzad Tashtarian, Hermann Hellwagner, Christian Timmerer
HTTP Adaptive Streaming (HAS) solutions utilize various Adaptive BitRate (ABR) algorithms to dynamically select appropriate video representations, aiming to adapt to fluctuations in network bandwidth. However, current ABR implementations have a limitation in that they are designed to function with one set of video representations, i.e., the bitrate ladder, which differ in bitrate and resolution, but
-
Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object Detection ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-05 Haoran Gao, Yiming Su, Fasheng Wang, Haojie Li
While significant progress has been made in recent years in the field of salient object detection (SOD), there are still limitations in heterogeneous modality fusion and salient feature integrity learning. The former is primarily attributed to a paucity of attention from researchers to the fusion of cross-scale information between different modalities during processing multi-modal heterogeneous data
-
Universal Relocalizer for Weakly Supervised Referring Expression Grounding ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-04 Panpan Zhang, Meng Liu, Xuemeng Song, Da Cao, Zan Gao, Liqiang Nie
This paper introduces the Universal Relocalizer, a novel approach designed for weakly supervised referring expression grounding. Our method strives to pinpoint a target proposal that corresponds to a specific query, eliminating the need for region-level annotations during training. To bolster the localization precision and enrich the semantic understanding of the target proposal, we devise three key
-
Multi-Domain Image-to-Image Translation with Cross-Granularity Contrastive Learning ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-04 Huiyuan Fu, Jin Liu, Ting Yu, Xin Wang, Huadong Ma
The objective of multi-domain image-to-image translation is to learn the mapping from a source domain to a target domain in multiple image domains while preserving the content representation of the source domain. Despite the importance and recent efforts, most previous studies disregard the large style discrepancy between images and instances in various domains, or fail to capture instance details
-
Dual Dynamic Threshold Adjustment Strategy ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-03 Xiruo Jiang, Yazhou Yao, Sheng Liu, Fumin Shen, Liqiang Nie, Xian-Sheng Hua
Loss functions and sample mining strategies are essential components in deep metric learning algorithms. However, the existing loss function or mining strategy often necessitate the incorporation of additional hyperparameters, notably the threshold, which defines whether the sample pair is informative. The threshold provides a stable numerical standard for determining whether to retain the pairs. It
-
Inter-Camera Identity Discrimination for Unsupervised Person Re-Identification ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-03 Mingfu Xiong, Kaikang Hu, Zhihan Lv, Fei Fang, Zhongyuan Wang, Ruimin Hu, Khan Muhammad
Unsupervised person re-identification (Re-ID) has garnered significant attention because of its data-friendly nature, as it does not require labeled data. Existing approaches primarily address this challenge by employing feature-clustering techniques to generate pseudo-labels. In addition, camera-proxy-based methods have emerged because of their impressive ability to cluster sample identities. However
-
StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-03 Xiaolong Shen, Zhedong Zheng, Yi Yang
The goal of sign language recognition (SLR) is to help those who are hard of hearing or deaf overcome the communication barrier. Most existing approaches can be typically divided into two lines, i.e., Skeleton-based and RGB-based methods, but both the two lines of methods have their limitations. Skeleton-based methods do not consider facial expressions, while RGB-based approaches usually ignore the
-
Multimodal Score Fusion with Sparse Low Rank Bilinear Pooling for Egocentric Hand Action Recognition ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-02 Kankana Roy
With the advent of egocentric cameras, there are new challenges where traditional computer vision are not sufficient to handle this kind of videos. Moreover, egocentric cameras often offer multiple modalities which need to be modeled jointly to exploit complimentary information. In this paper, we proposed a sparse low-rank bilinear score pooling approach for egocentric hand action recognition from
-
Double Reference Guided Interactive 2D and 3D Caricature Generation ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-01 Xin Huang, Dong Liang, Hongrui Cai, Yunfeng Bai, Juyong Zhang, Feng Tian, Jinyuan Jia
In this paper, we propose the first geometry and texture (double) referenced interactive 2D and 3D caricature generating and editing method. The main challenge of caricature generation lies in the fact that it not only exaggerates the facial geometry but also refreshes the facial texture. We address this challenge by utilizing the semantic segmentation maps as an intermediary domain, removing the influence
-
Text-Guided Synthesis of Masked Face Images ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-30 Anjali T, Masilamani V
The COVID-19 pandemic has made us all understand that wearing a face mask protects us from the spread of respiratory viruses. The face authentication systems, which are trained on the basis of facial key points such as the eyes, nose, and mouth, found it difficult to identify the person when the majority of the face is covered by the face mask. Removing the mask for authentication will cause the infection
-
Effective Video Summarization by Extracting Parameter-free Motion Attention ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-30 Tingting Han, Quan Zhou, Jun Yu, Zhou Yu, Jianhui Zhang, Sicheng Zhao
Video summarization remains a challenging task despite increasing research efforts. Traditional methods focus solely on long-range temporal modeling of video frames, overlooking important local motion information which can not be captured by frame-level video representations. In this paper, we propose the Parameter-free Motion Attention Module (PMAM) to exploit the crucial motion clues potentially
-
Paying Attention to Vehicles: A Systematic Review on Transformer-Based Vehicle Re-Identification ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-30 Yan Qian, Johan Barthélemy, Bo Du, Jun Shen
Vehicle re-identification (v-reID) is a crucial and challenging task in the intelligent transportation systems (ITS). While vehicle re-identification plays a role in analysing traffic behaviour, criminal investigation, or automatic toll collection, it is also a key component for the construction of smart cities. With the recent introduction of transformer models and their rapid development in computer
-
RAC-Chain: An Asynchronous Consensus-based Cross-chain Approach to Scalable Blockchain for Metaverse ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27 Tianxiu Xie, Keke Gai, Liehuang Zhu, Shuo Wang, Zijian Zhang
The metaverse, as an emerging technical term, conceptually aims to construct a virtual digital space that runs parallel to the physical world. Due to human behaviors and interactions being represented in the virtual world, security in the metaverse is a challenging issue in which the traditional centralized service model is one of the threat sources. To conquer the obstacle caused by centralized computing
-
A Privacy-preserving Auction Mechanism for Learning Model as an NFT in Blockchain-driven Metaverse ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27 Qinnan Zhang, Zehui Xiong, Jianming Zhu, Sheng Gao, Wanting Yang
The Metaverse, envisioned as the next-generation Internet, will be constructed via twining a practical world in a virtual form, wherein Meterverse service providers (MSPs) are required to collect massive data from Meterverse users (MUs). In this regard, a critical demand exists for MSPs to motivate MUs to contribute computing resources and data while preserving user privacy. Federated learning (FL)
-
Detection of Adversarial Facial Accessory Presentation Attacks Using Local Face Differential ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27 Fei Peng, Le Qin, Min Long, Jin Li
To counter adversarial facial accessory presentation attacks (PAs), a detection method based on local face differential is proposed in this article. It extracts the local face differential features from a suspected face image and a reference face image, and then adaptively fuses the differential features of different local face regions to detect adversarial facial accessory PAs. Meanwhile, the principle
-
Tensorial Evolutionary Optimization for Natural Image Matting ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27 Si-Chao Lei, Yue-Jiao Gong, Xiao-Lin Xiao, Yi-Cong Zhou, Jun Zhang
Natural image matting has garnered increasing attention in various computer vision applications. The matting problem aims to find the optimal foreground/background (F/B) color pair for each unknown pixel and thus obtain an alpha matte indicating the opacity of the foreground object. This problem is typically modeled as a large-scale pixel pair combinatorial optimization (PPCO) problem. Heuristic optimization
-
ISF-GAN: Imagine, Select, and Fuse with GPT-Based Text Enrichment for Text-to-Image Synthesis ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-28 Yefei Sheng, Ming Tao, Jie Wang, Bing-Kun Bao
Text-to-Image synthesis aims to generate an accurate and semantically consistent image from a given text description. However, it is difficult for existing generative methods to generate semantically complete images from a single piece of text. Some works try to expand the input text to multiple captions via retrieving similar descriptions of the input text from the training set, but still fail to
-
4D Facial Expression Diffusion Model ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-28 Kaifeng Zou, Sylvain Faisan, Boyang Yu, Sébastien Valette, Hyewon Seo
Facial expression generation is one of the most challenging and long-sought aspects of character animation, with many interesting applications. The challenging task, traditionally having relied heavily on digital craftspersons, remains yet to be explored. In this paper, we introduce a generative framework for generating 3D facial expression sequences (i.e. 4D faces) that can be conditioned on different
-
A Unified Framework for Jointly Compressing Visual and Semantic Data ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-28 Shizhan Liu, Weiyao Lin, Yihang Chen, Yufeng Zhang, Wenrui Dai, John See, Hongkai Xiong
The rapid advancement of multimedia and imaging technologies has resulted in increasingly diverse visual and semantic data. A large range of applications such as remote-assisted driving requires the amalgamated storage and transmission of various visual and semantic data. However, existing works suffer from the limitation of insufficiently exploiting the redundancy between different types of data.
-
Semantics and Non-fungible Tokens for Copyright Management on the Metaverse and Beyond ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27 Roberto García, Ana Cediel, Mercè Teixidó, Rosa Gil
Recent initiatives related to the Metaverse focus on better visualization, like augmented or virtual reality, but also persistent digital objects. To guarantee real ownership of these digital objects, open systems based on public blockchains and Non-Fungible Tokens (NFTs) are emerging together with a nascent decentralized and open creator economy. To manage this emerging economy in a more organized
-
HCNCT: A Cross-chain Interaction Scheme for the Blockchain-based Metaverse ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27 Yongjun Ren, Zhiying Lv, Neal N. Xiong, Jin Wang
As a new type of digital living space that blends virtual and reality, Metaverse combines many emerging technologies. It provides an immersive experience based on VR technology and stores and protects users’ digital content and digital assets through blockchain technology. However, different virtual environments are often highly heterogeneous in terms of underlying architecture and software implementation
-
QuickCSGModeling: Quick CSG Operations Based on Fusing Signed Distance Fields for VR Modeling ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27 Shuangmin Chen, Rui Xu, Jian Xu, Shiqing Xin, Changhe Tu, Chenglei Yang, Lin Lu
The latest advancements in Virtual Reality (VR) enable the creation of 3D models within a holographic immersive simulation environment. In this article, we create QuickCSGModeling, a user-friendly mid-air interactive modeling system. We first prepare a dataset consisting of diverse components and precompute the discrete signed distance function (SDF) for each component. During the modeling phase, users
-
MIS: A Multi-Identifier Management and Resolution System in the Metaverse ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27 Han Wang, Hui Li, Abla Smahi, Feng Zhao, Yao Yao, Ching Chuen Chan, Shiyu Wang, Wenyuan Yang, Shuo-Yen Robert Li
The metaverse gradually evolves into a virtual world containing a series of interconnected sub-metaverses. Diverse digital resources, including identities, contents, services, and supporting data, are key components of the sub-metaverse. Therefore, a Domain Name System (DNS)-like system is necessary for efficient management and resolution. However, the legacy DNS was designed with security vulnerabilities
-
Multimodal Visual-Semantic Representations Learning for Scene Text Recognition ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27 Xinjian Gao, Ye Pang, Yuyu Liu, Maokun Han, Jun Yu, Wei Wang, Yuanxu Chen
Scene Text Recognition (STR), the critical step in OCR systems, has attracted much attention in computer vision. Recent research on modeling textual semantics with Language Model (LM) has witnessed remarkable progress. However, LM only optimizes the joint probability of the estimated characters generated from the Vision Model (VM) in a single language modality, ignoring the visual-semantic relations
-
Joint Distortion Restoration and Quality Feature Learning for No-reference Image Quality Assessment ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27 Jifan Yang, Zhongyuan Wang, Baojin Huang, Jiaxin Ai, Yuhong Yang, Zixiang Xiong
No-reference image quality assessment (NR-IQA) methods, inspired by the free energy principle, improve the accuracy of image quality prediction by simulating the human brain’s repair process for distorted images. However, existing methods use separate optimization schemes for distortion restoration and quality prediction, which undermines the accurate mapping of feature representations to quality scores
-
Scene Graph Lossless Compression with Adaptive Prediction for Objects and Relations ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27 Weiyao Lin, Yufeng Zhang, Wenrui Dai, Huabin Liu, John See, Hongkai Xiong
The scene graph is a novel data structure describing objects and their pairwise relationship within image scenes. As the size of scene graphs in vision and multimedia applications increases, the need for lossless storage and transmission of such data becomes more critical. However, the compression of scene graphs is less studied because of the complicated data structures involved and complex distributions
-
Real-time Attentive Dilated U-Net for Extremely Dark Image Enhancement ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27 Junjian Huang, Hao Ren, Shulin Liu, Yong Liu, Chuanlu Lv, Jiawen Lu, Changyong Xie, Hong Lu
Images taken under low-light conditions suffer from poor visibility, color distortion and graininess, all of which degrade the image quality and hamper the performance of downstream vision tasks, such as object detection and instance segmentation in the field of autonomous driving, making low-light enhancement an indispensable basic component of high-level visual tasks. Low-light enhancement aims to
-
Temporal Scene Montage for Self-Supervised Video Scene Boundary Detection ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-26 Jiawei Tan, Pingan Yang, Lu Chen, Hongxing Wang
Once a video sequence is organized as basic shot units, it is of great interest to temporally link shots into semantic-compact scene segments to facilitate long video understanding. However, it still challenges existing video scene boundary detection methods to handle various visual semantics and complex shot relations in video scenes. We proposed a novel self-supervised learning method, Video Scene
-
Discriminative Segment Focus Network for Fine-grained Video Action Recognition ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-26 Baoli Sun, Xinchen Ye, Tiantian Yan, Zhihui Wang, Haojie Li, Zhiyong Wang
Fine-grained video action recognition aims to identify minor and discriminative variations among fine categories of actions. While many recent action recognition methods have been proposed to better model spatio-temporal representations, how to model the interactions among discriminative atomic actions to effectively characterize inter-class and intra-class variations has been neglected, which is vital
-
Efficient Brain Tumor Segmentation with Lightweight Separable Spatial Convolutional Network ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-23 Hao Zhang, Meng Liu, Yuan Qi, Yang Ning, Shunbo Hu, Liqiang Nie, Wenyin Zhang
Accurate and automated segmentation of lesions in brain MRI scans is crucial in diagnostics and treatment planning. Despite the significant achievements of existing approaches, they often require substantial computational resources and fail to fully exploit the synergy between low-level and high-level features. To address these challenges, we introduce the Separable Spatial Convolutional Network (SSCN)
-
Building Category Graphs Representation with Spatial and Temporal Attention for Visual Navigation ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-22 Xiaobo Hu, Youfang Lin, HeHe Fan, Shuo Wang, Zhihao Wu, Kai Lv
Given an object of interest, visual navigation aims to reach the object’s location based on a sequence of partial observations. To this end, an agent needs to 1) acquire specific knowledge about the relations of object categories in the world during training and 2) locate the target object based on the pre-learned object category relations and its trajectory in the current unseen environment. In this
-
Recoverable Privacy-Preserving Image Classification through Noise-like Adversarial Examples ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-21 Jun Liu, Jiantao Zhou, Jinyu Tian, Weiwei Sun
With the increasing prevalence of cloud computing platforms, ensuring data privacy during the cloud-based image-related services such as classification has become crucial. In this study, we propose a novel privacy-preserving image classification scheme that enables the direct application of classifiers trained in the plaintext domain to classify encrypted images, without the need of retraining a dedicated
-
New Metrics and Dataset for Biological Development Video Generation ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-20 P. Celard, E. L. Iglesias, J. M. Sorribes-Fdez, L. Borrajo, A. Seara Vieira
Image generative models have advanced in many areas to produce synthetic images of high resolution and detail. This success has enabled its use in the biomedical field, paving the way for the generation of videos showing the biological evolution of its content. Despite the power of generative video models, their use has not yet extended to time-based development, focusing almost exclusively on generating
-
Feature Extraction Matters More: An Effective and Efficient Universal Deepfake Disruptor ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-20 Long Tang, Dengpan Ye, Zhenhao Lu, Yunming Zhang, Chuanxi Chen
Face manipulation can modify a victim’s facial attributes, e.g., age or hair color, in an image, which is an important component of DeepFakes. Adversarial examples are an emerging approach to combat the threat of visual misinformation to society. To efficiently protect facial images from being forged, designing a universal face anti-manipulation disruptor is essential. However, existing works treat
-
Make Partition Fit Task: A Novel Framework for Joint Learning of City Region Partition and Representation ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-17 Mingyu Deng, Wanyi Zhang, Jie Zhao, Zhu Wang, Mingliang Zhou, Jun Luo, Chao Chen
The proliferation of multimodal big data in cities provides unprecedented opportunities for modeling and forecasting urban problems, e.g., crime prediction and house price prediction, through data-driven approaches. A fundamental and critical issue in modeling and forecasting urban problems lies in identifying suitable spatial analysis units, also known as city region partition. Existing works rely
-
Generative Adversarial Networks with Learnable Auxiliary Module for Image Synthesis ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-17 Yan Gan, Chenxue Yang, Mao Ye, Renjie Huang, Deqiang Ouyang
Training generative adversarial networks (GANs) for noise-to-image synthesis is a challenge task, primarily due to the instability of GANs’ training process. One of the key issues is the generator’s sensitivity to input data, which can cause sudden fluctuations in the generator’s loss value with certain inputs. This sensitivity suggests an inadequate ability to resist disturbances in the generator
-
Multi-Agent DRL-based Multipath Scheduling for Video Streaming with QUIC ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-15 Xueqiang Han, Biao Han, Jinrong Li, Congxi Song
The popularization of video streaming brings challenges in satisfying diverse Quality of Service (QoS) requirements. The multipath extension of the Quick UDP Internet Connection (QUIC) protocol, also called MPQUIC, has the potential to improve video streaming performance with multiple simultaneously transmitting paths. The multipath scheduler of MPQUIC determines how to distribute the packets onto
-
Realizing Efficient On-Device Language-based Image Retrieval ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-15 Zhiming Hu, Mete Kemertas, Lan Xiao, Caleb Phillips, Iqbal Mohomed, Afsaneh Fazly
Advances in deep learning have enabled accurate language-based search and retrieval, e.g., over user photos, in the cloud. Many users prefer to store their photos in the home due to privacy concerns. As such, a need arises for models that can perform cross-modal search on resource-limited devices. State-of-the-art cross-modal retrieval models achieve high accuracy through learning entangled representations
-
Invisible Adversarial Watermarking: A Novel Security Mechanism for Enhancing Copyright Protection ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-14 Jinwei Wang, Haihua Wang, Jiawei Zhang, Hao Wu, Xiangyang Luo, Bin Ma
Invisible watermarking can be used as an important tool for copyright certification in the Metaverse. However, with the advent of deep learning, Deep Neural Networks (DNNs) have posed new threats to this technique. For example, artificially trained DNNs can perform unauthorized content analysis and achieve illegal access to protected images. Furthermore, some specially crafted DNNs may even erase invisible
-
Audio-Visual Contrastive Pre-train for Face Forgery Detection ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-13 Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Weiming Zhang, Ying Guo, Zhen Cheng, Pengfei Yan, Nenghai Yu
The highly realistic avatar in the metaverse may lead to severe leakage of facial privacy. Malicious users can more easily obtain the 3D structure of faces, thus using Deepfake technology to create counterfeit videos with higher realism. To automatically discern facial videos forged with the advancing generation techniques, deepfake detectors need to achieve stronger generalization abilities. Inspired
-
Automatic Lyric Transcription and Automatic Music Transcription from Multimodal Singing ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-12 Xiangming Gu, Longshen Ou, Wei Zeng, Jianan Zhang, Nicholas Wong, Ye Wang
Automatic lyric transcription (ALT) refers to transcribing singing voices into lyrics while automatic music transcription (AMT) refers to transcribing singing voices into note events, i.e., musical MIDI notes. Despite these two tasks having significant potential for practical application, they are still nascent. This is because the transcription of lyrics and note events solely from singing audio is
-
Suitable and Style-consistent Multi-texture Recommendation for Cartoon Illustrations ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-12 Huisi Wu, Zhaoze Wang, Yifan Li, Xueting Liu, Tong-Yee Lee
Texture plays an important role in cartoon illustrations to display object materials and enrich visual experiences. Unfortunately, manually designing and drawing an appropriate texture is not easy even for proficient artists, let alone novice or amateur people. While there exist tons of textures on the Internet, it is not easy to pick an appropriate one using traditional text-based search engines.
-
Mastering Deepfake Detection: A Cutting-Edge Approach to Distinguish GAN and Diffusion-Model Images ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-09 Luca Guarnera, Oliver Giudice, Sebastiano Battiato
Detecting and recognizing deepfakes is a pressing issue in the digital age. In this study, we first collected a dataset of pristine images and fake ones properly generated by nine different Generative Adversarial Network (GAN) architectures and four Diffusion Models (DM). The dataset contained a total of 83,000 images, with equal distribution between the real and deepfake data. Then, to address different
-
Backdoor Two-Stream Video Models on Federated Learning ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-07 Jing Zhao, Hongwei Yang, Hui He, Jie Peng, Weizhe Zhang, Jiangqun Ni, Arun Kumar Sangaiah, Aniello Castiglione
Video models on federated learning (FL) enable continual learning of the involved models for video tasks on end-user devices while protecting the privacy of end-user data. As a result, the security issues on FL, e.g., the backdoor attacks on FL and their defense have increasingly becoming the domains of extensive research in recent years. The backdoor attacks on FL are a class of poisoning attacks
-
Review and Analysis of RGBT Single Object Tracking Methods: A Fusion Perspective ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-07 ZhiHao Zhang, Jun Wang, Zhuli Zang, Lei Jin, Shengjie Li, Hao Wu, Jian Zhao, Zhang Bo
Visual tracking is a fundamental task in computer vision with significant practical applications in various domains, including surveillance, security, robotics, and human-computer interaction. However, it may face limitations in visible light data, such as low-light environments, occlusion, and camouflage, which can significantly reduce its accuracy. To cope with these challenges, researchers have
-
A Bitcoin-based Secure Outsourcing Scheme for Optimization Problem in Multimedia Internet of Things ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-08 Wenyuan Yang, Shaocong Wu, Jianwei Fei, Xianwang Zeng, Yuemin Ding, Zhihua Xia
With the development of the Internet of Things (IoT) and cloud computing, various multimedia data such as audio, video, and images have experienced explosive growth, ushering in the era of big data. Large-scale computing tasks in the Multimedia Internet of Things (M-IoT), such as mathematical optimization problems, have begun to be outsourced from IoT devices with limited computing power to cloud servers