ACM Transactions on Multimedia Computing, Communications, and Applications期刊最新论文, 计算机, 软件工程类期刊,

Multi-grained Representation Aggregating Transformer with Gating Cycle for Change Captioning

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-22
Shengbin Yue, Yunbin Tu, Liang Li, Shengxiang Gao, Zhengtao Yu

Change captioning aims to describe the difference within an image pair in natural language, which combines visual comprehension and language generation. Although significant progress has been achieved, it remains a key challenge of perceiving the object change from different perspectives, especially the severe situation with drastic viewpoint change. In this paper, we propose a novel full-attentive

更新日期：2024-04-22

详情收藏

C2: ABR Streaming in Cognizant of Consumption Context for Improved QoE and Resource Usage Tradeoffs

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-18
Cheonjin Park, Chinmaey Shende, Subhabrata Sen, Bing Wang

Smartphones have emerged as ubiquitous platforms for people to consume content in a wide range of consumption contexts (C2), e.g., over cellular or WiFi, playing back audio and video directly on phone or through peripheral devices such as external screens or speakers. In this paper, we argue that a user’s specific C2 is an important factor to consider in Adaptive Bitrate (ABR) streaming. We examine

更新日期：2024-04-18

详情收藏

Seventeen Years of the ACM Transactions on Multimedia Computing, Communications and Applications: A Bibliometric Overview

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-18
Walayat Hussain, Honghao Gao, Rafiul Karim, Abdulmotaleb El Saddik

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) has been dedicated to advancing multimedia research, fostering discoveries, innovations, and practical applications since 2005. The journal consistently publishes top-notch, original research in emerging fields through open submissions, calls for papers, special issues, rigorous review processes, and diverse research

更新日期：2024-04-18

详情收藏

Rank-based Hashing for Effective and Efficient Nearest Neighbor Search for Image Retrieval

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-16
Vinicius Sato Kawai, Lucas Pascotti Valem, Alexandro Baldassin, Edson Borin, Daniel Carlos Guimarães Pedronette, Longin Jan Latecki

The large and growing amount of digital data creates a pressing need for approaches capable of indexing and retrieving multimedia content. A traditional and fundamental challenge consists of effectively and efficiently performing nearest-neighbor searches. After decades of research, several different methods are available, including trees, hashing, and graph-based approaches. Most of the current methods

更新日期：2024-04-16

详情收藏

A Quality-Aware and Obfuscation-Based Data Collection Scheme for Cyber-Physical Metaverse Systems

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-16
Jianheng Tang, Kejia Fan, Wenjie Yin, Shihao Yang, Yajiang Huang, Anfeng Liu, Neal N. Xiong, Mianxiong Dong, Tian Wang, Shaobo Zhang

In pursuit of an immersive virtual experience within the Cyber-Physical Metaverse Systems (CPMS), the construction of Avatars often requires a significant amount of real-world data. Mobile Crowd Sensing (MCS) has emerged as an efficient method for collecting data for CPMS. While progress has been made in protecting the privacy of workers, little attention has been given to safeguarding task privacy

更新日期：2024-04-16

详情收藏

Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-11
Zhewei Tu, Xiangbo Shu, Peng Huang, Rui Yan, Zhenxing Liu, Jiachao Zhang

Semi-supervised action recognition is a challenging yet prospective task due to its low reliance on costly labeled videos. One high-profile solution is to explore frame-level weak/strong augmentations for learning abundant representations, inspired by the FixMatch framework dominating the semi-supervised image classification task. However, such a solution mainly brings perturbations in terms of texture

更新日期：2024-04-11

详情收藏

GANs in the Panorama of Synthetic Data Generation Methods: Application and Evaluation: Enhancing Fake News Detection with GAN-Generated Synthetic Data: ACM Transactions on Multimedia Computing, Communications, and Applications: Vol 0, No ja

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-10
Bruno Vaz, Álvaro Figueira

This paper focuses on the creation and evaluation of synthetic data to address the challenges of imbalanced datasets in machine learning applications (ML), using fake news detection as a case study. We conducted a thorough literature review on generative adversarial networks (GANs) for tabular data, synthetic data generation methods, and synthetic data quality assessment. By augmenting a public news

更新日期：2024-04-10

详情收藏

SigFormer: Sparse Signal-Guided Transformer for Multi-Modal Action Segmentation

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-10
Qi Liu, Xinchen Liu, Kun Liu, Xiaoyan Gu, Wu Liu

Multi-modal human action segmentation is a critical and challenging task with a wide range of applications. Nowadays, the majority of approaches concentrate on the fusion of dense signals (i.e., RGB, optical flow, and depth maps). However, the potential contributions of sparse IoT sensor signals, which can be crucial for achieving accurate recognition, have not been fully explored. To make up for this

更新日期：2024-04-10

详情收藏

DBGAN: Dual Branch Generative Adversarial Network for Multi-modal MRI Translation

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-10
Jun Lyu, Shouang Yan, M. Shamim Hossain

Existing Magnetic resonance imaging (MRI) translation models rely on Generative Adversarial Networks, primarily employing simple convolutional neural networks. Unfortunately, these networks struggle to capture global representations and contextual relationships within MRI images. While the advent of Transformers enables capturing long-range feature dependencies, they often compromise the preservation

更新日期：2024-04-10

详情收藏

Psychology-Guided Environment Aware Network for Discovering Social Interaction Groups from Videos

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-09
Jiaqi Yu, Jinhai Yang, Hua Yang, Renjie Pan, Pingrui Lai, Guangtao Zhai

Social interaction is a common phenomenon in human societies. Different from discovering groups based on the similarity of individuals’ actions, social interaction focuses more on the mutual influence between people. Although people can easily judge whether or not there are social interactions in a real-world scene, it is difficult for an intelligent system to discover social interactions. Initiating

更新日期：2024-04-09

详情收藏

RSUIGM: Realistic Synthetic Underwater Image Generation with Image Formation Model

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-08
Chaitra Desai, Sujay Benur, Ujwala Patil, Uma Mudenagudi

In this paper, we propose to synthesize realistic underwater images with a novel image formation model, considering both downwelling depth and line of sight (LOS) distance as cue and call it as Realistic Synthetic Underwater Image Generation Model, RSUIGM. The light interaction in the ocean is a complex process and demands specific modeling of direct and backscattering phenomenon to capture the degradations

更新日期：2024-04-08

详情收藏

High Fidelity Makeup via 2D and 3D Identity Preservation Net

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-08
Jinliang Liu, Zhedong Zheng, Zongxin Yang, Yi Yang

In this paper, we address the challenging makeup transfer task, aiming to transfer makeup from a reference image to a source image while preserving facial geometry and background consistency. Existing deep neural network-based methods have shown promising results in aligning facial parts and transferring makeup textures. However, they often neglect the facial geometry of the source image, leading to

更新日期：2024-04-08

详情收藏

A Self-Defense Copyright Protection Scheme for NFT Image Art Based on Information Embedding

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-06
Fan Wang, Zhangjie Fu, Xiang Zhang

Non-convertible tokens (NFTs) have become a fundamental part of the metaverse ecosystem due to its uniqueness and immutability. However, existing copyright protection schemes of NFT image art relied on the NFTs itself minted by third-party platforms. A minted NFT image art only tracks and verifies the entire transaction process, but the legitimacy of the source and ownership of its mapped digital image

更新日期：2024-04-06

详情收藏

Facial soft-biometrics obfuscation through adversarial attacks

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-06
Vincenzo Carletti, Pasquale Foggia, Antonio Greco, Alessia Saggese, Mario Vento

Sharing facial pictures through online services, especially on social networks, has become a common habit for thousands of users. This practice hides a possible threat to privacy: the owners of such services, as well as malicious users, could automatically extract information from faces using modern and effective neural networks. In this paper, we propose the harmless use of adversarial attacks, i

更新日期：2024-04-06

详情收藏

MEDUSA: A Dynamic Codec Switching Approach in HTTP Adaptive Streaming

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-05
Daniele Lorenzi, Farzad Tashtarian, Hermann Hellwagner, Christian Timmerer

HTTP Adaptive Streaming (HAS) solutions utilize various Adaptive BitRate (ABR) algorithms to dynamically select appropriate video representations, aiming to adapt to fluctuations in network bandwidth. However, current ABR implementations have a limitation in that they are designed to function with one set of video representations, i.e., the bitrate ladder, which differ in bitrate and resolution, but

更新日期：2024-04-05

详情收藏

Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object Detection

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-05
Haoran Gao, Yiming Su, Fasheng Wang, Haojie Li

While significant progress has been made in recent years in the field of salient object detection (SOD), there are still limitations in heterogeneous modality fusion and salient feature integrity learning. The former is primarily attributed to a paucity of attention from researchers to the fusion of cross-scale information between different modalities during processing multi-modal heterogeneous data

更新日期：2024-04-05

详情收藏

Universal Relocalizer for Weakly Supervised Referring Expression Grounding

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-04
Panpan Zhang, Meng Liu, Xuemeng Song, Da Cao, Zan Gao, Liqiang Nie

This paper introduces the Universal Relocalizer, a novel approach designed for weakly supervised referring expression grounding. Our method strives to pinpoint a target proposal that corresponds to a specific query, eliminating the need for region-level annotations during training. To bolster the localization precision and enrich the semantic understanding of the target proposal, we devise three key

更新日期：2024-04-05

详情收藏

Multi-Domain Image-to-Image Translation with Cross-Granularity Contrastive Learning

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-04
Huiyuan Fu, Jin Liu, Ting Yu, Xin Wang, Huadong Ma

The objective of multi-domain image-to-image translation is to learn the mapping from a source domain to a target domain in multiple image domains while preserving the content representation of the source domain. Despite the importance and recent efforts, most previous studies disregard the large style discrepancy between images and instances in various domains, or fail to capture instance details

更新日期：2024-04-04

详情收藏

Dual Dynamic Threshold Adjustment Strategy

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-03
Xiruo Jiang, Yazhou Yao, Sheng Liu, Fumin Shen, Liqiang Nie, Xian-Sheng Hua

Loss functions and sample mining strategies are essential components in deep metric learning algorithms. However, the existing loss function or mining strategy often necessitate the incorporation of additional hyperparameters, notably the threshold, which defines whether the sample pair is informative. The threshold provides a stable numerical standard for determining whether to retain the pairs. It

更新日期：2024-04-04

详情收藏

Inter-Camera Identity Discrimination for Unsupervised Person Re-Identification

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-03
Mingfu Xiong, Kaikang Hu, Zhihan Lv, Fei Fang, Zhongyuan Wang, Ruimin Hu, Khan Muhammad

Unsupervised person re-identification (Re-ID) has garnered significant attention because of its data-friendly nature, as it does not require labeled data. Existing approaches primarily address this challenge by employing feature-clustering techniques to generate pseudo-labels. In addition, camera-proxy-based methods have emerged because of their impressive ability to cluster sample identities. However

更新日期：2024-04-03

详情收藏

StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-03
Xiaolong Shen, Zhedong Zheng, Yi Yang

The goal of sign language recognition (SLR) is to help those who are hard of hearing or deaf overcome the communication barrier. Most existing approaches can be typically divided into two lines, i.e., Skeleton-based and RGB-based methods, but both the two lines of methods have their limitations. Skeleton-based methods do not consider facial expressions, while RGB-based approaches usually ignore the

更新日期：2024-04-03

详情收藏

Multimodal Score Fusion with Sparse Low Rank Bilinear Pooling for Egocentric Hand Action Recognition

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-02
Kankana Roy

With the advent of egocentric cameras, there are new challenges where traditional computer vision are not sufficient to handle this kind of videos. Moreover, egocentric cameras often offer multiple modalities which need to be modeled jointly to exploit complimentary information. In this paper, we proposed a sparse low-rank bilinear score pooling approach for egocentric hand action recognition from

更新日期：2024-04-02

详情收藏

Double Reference Guided Interactive 2D and 3D Caricature Generation

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-04-01
Xin Huang, Dong Liang, Hongrui Cai, Yunfeng Bai, Juyong Zhang, Feng Tian, Jinyuan Jia

In this paper, we propose the first geometry and texture (double) referenced interactive 2D and 3D caricature generating and editing method. The main challenge of caricature generation lies in the fact that it not only exaggerates the facial geometry but also refreshes the facial texture. We address this challenge by utilizing the semantic segmentation maps as an intermediary domain, removing the influence

更新日期：2024-04-01

详情收藏

Text-Guided Synthesis of Masked Face Images

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-30
Anjali T, Masilamani V

The COVID-19 pandemic has made us all understand that wearing a face mask protects us from the spread of respiratory viruses. The face authentication systems, which are trained on the basis of facial key points such as the eyes, nose, and mouth, found it difficult to identify the person when the majority of the face is covered by the face mask. Removing the mask for authentication will cause the infection

更新日期：2024-03-30

详情收藏

Effective Video Summarization by Extracting Parameter-free Motion Attention

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-30
Tingting Han, Quan Zhou, Jun Yu, Zhou Yu, Jianhui Zhang, Sicheng Zhao

Video summarization remains a challenging task despite increasing research efforts. Traditional methods focus solely on long-range temporal modeling of video frames, overlooking important local motion information which can not be captured by frame-level video representations. In this paper, we propose the Parameter-free Motion Attention Module (PMAM) to exploit the crucial motion clues potentially

更新日期：2024-03-30

详情收藏

Paying Attention to Vehicles: A Systematic Review on Transformer-Based Vehicle Re-Identification

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-30
Yan Qian, Johan Barthélemy, Bo Du, Jun Shen

Vehicle re-identification (v-reID) is a crucial and challenging task in the intelligent transportation systems (ITS). While vehicle re-identification plays a role in analysing traffic behaviour, criminal investigation, or automatic toll collection, it is also a key component for the construction of smart cities. With the recent introduction of transformer models and their rapid development in computer

更新日期：2024-03-30

详情收藏

RAC-Chain: An Asynchronous Consensus-based Cross-chain Approach to Scalable Blockchain for Metaverse

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27
Tianxiu Xie, Keke Gai, Liehuang Zhu, Shuo Wang, Zijian Zhang

The metaverse, as an emerging technical term, conceptually aims to construct a virtual digital space that runs parallel to the physical world. Due to human behaviors and interactions being represented in the virtual world, security in the metaverse is a challenging issue in which the traditional centralized service model is one of the threat sources. To conquer the obstacle caused by centralized computing

更新日期：2024-03-29

详情收藏

A Privacy-preserving Auction Mechanism for Learning Model as an NFT in Blockchain-driven Metaverse

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27
Qinnan Zhang, Zehui Xiong, Jianming Zhu, Sheng Gao, Wanting Yang

The Metaverse, envisioned as the next-generation Internet, will be constructed via twining a practical world in a virtual form, wherein Meterverse service providers (MSPs) are required to collect massive data from Meterverse users (MUs). In this regard, a critical demand exists for MSPs to motivate MUs to contribute computing resources and data while preserving user privacy. Federated learning (FL)

更新日期：2024-03-29

详情收藏

Detection of Adversarial Facial Accessory Presentation Attacks Using Local Face Differential

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27
Fei Peng, Le Qin, Min Long, Jin Li

To counter adversarial facial accessory presentation attacks (PAs), a detection method based on local face differential is proposed in this article. It extracts the local face differential features from a suspected face image and a reference face image, and then adaptively fuses the differential features of different local face regions to detect adversarial facial accessory PAs. Meanwhile, the principle

更新日期：2024-03-29

详情收藏

Tensorial Evolutionary Optimization for Natural Image Matting

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27
Si-Chao Lei, Yue-Jiao Gong, Xiao-Lin Xiao, Yi-Cong Zhou, Jun Zhang

Natural image matting has garnered increasing attention in various computer vision applications. The matting problem aims to find the optimal foreground/background (F/B) color pair for each unknown pixel and thus obtain an alpha matte indicating the opacity of the foreground object. This problem is typically modeled as a large-scale pixel pair combinatorial optimization (PPCO) problem. Heuristic optimization

更新日期：2024-03-29

详情收藏

ISF-GAN: Imagine, Select, and Fuse with GPT-Based Text Enrichment for Text-to-Image Synthesis

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-28
Yefei Sheng, Ming Tao, Jie Wang, Bing-Kun Bao

Text-to-Image synthesis aims to generate an accurate and semantically consistent image from a given text description. However, it is difficult for existing generative methods to generate semantically complete images from a single piece of text. Some works try to expand the input text to multiple captions via retrieving similar descriptions of the input text from the training set, but still fail to

更新日期：2024-03-28

详情收藏

4D Facial Expression Diffusion Model

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-28
Kaifeng Zou, Sylvain Faisan, Boyang Yu, Sébastien Valette, Hyewon Seo

Facial expression generation is one of the most challenging and long-sought aspects of character animation, with many interesting applications. The challenging task, traditionally having relied heavily on digital craftspersons, remains yet to be explored. In this paper, we introduce a generative framework for generating 3D facial expression sequences (i.e. 4D faces) that can be conditioned on different

更新日期：2024-03-28

详情收藏

A Unified Framework for Jointly Compressing Visual and Semantic Data

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-28
Shizhan Liu, Weiyao Lin, Yihang Chen, Yufeng Zhang, Wenrui Dai, John See, Hongkai Xiong

The rapid advancement of multimedia and imaging technologies has resulted in increasingly diverse visual and semantic data. A large range of applications such as remote-assisted driving requires the amalgamated storage and transmission of various visual and semantic data. However, existing works suffer from the limitation of insufficiently exploiting the redundancy between different types of data.

更新日期：2024-03-28

详情收藏

Semantics and Non-fungible Tokens for Copyright Management on the Metaverse and Beyond

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27
Roberto García, Ana Cediel, Mercè Teixidó, Rosa Gil

Recent initiatives related to the Metaverse focus on better visualization, like augmented or virtual reality, but also persistent digital objects. To guarantee real ownership of these digital objects, open systems based on public blockchains and Non-Fungible Tokens (NFTs) are emerging together with a nascent decentralized and open creator economy. To manage this emerging economy in a more organized

更新日期：2024-03-28

详情收藏

HCNCT: A Cross-chain Interaction Scheme for the Blockchain-based Metaverse

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27
Yongjun Ren, Zhiying Lv, Neal N. Xiong, Jin Wang

As a new type of digital living space that blends virtual and reality, Metaverse combines many emerging technologies. It provides an immersive experience based on VR technology and stores and protects users’ digital content and digital assets through blockchain technology. However, different virtual environments are often highly heterogeneous in terms of underlying architecture and software implementation

更新日期：2024-03-28

详情收藏

QuickCSGModeling: Quick CSG Operations Based on Fusing Signed Distance Fields for VR Modeling

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27
Shuangmin Chen, Rui Xu, Jian Xu, Shiqing Xin, Changhe Tu, Chenglei Yang, Lin Lu

The latest advancements in Virtual Reality (VR) enable the creation of 3D models within a holographic immersive simulation environment. In this article, we create QuickCSGModeling, a user-friendly mid-air interactive modeling system. We first prepare a dataset consisting of diverse components and precompute the discrete signed distance function (SDF) for each component. During the modeling phase, users

更新日期：2024-03-28

详情收藏

MIS: A Multi-Identifier Management and Resolution System in the Metaverse

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27
Han Wang, Hui Li, Abla Smahi, Feng Zhao, Yao Yao, Ching Chuen Chan, Shiyu Wang, Wenyuan Yang, Shuo-Yen Robert Li

The metaverse gradually evolves into a virtual world containing a series of interconnected sub-metaverses. Diverse digital resources, including identities, contents, services, and supporting data, are key components of the sub-metaverse. Therefore, a Domain Name System (DNS)-like system is necessary for efficient management and resolution. However, the legacy DNS was designed with security vulnerabilities

更新日期：2024-03-28

详情收藏

Multimodal Visual-Semantic Representations Learning for Scene Text Recognition

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27
Xinjian Gao, Ye Pang, Yuyu Liu, Maokun Han, Jun Yu, Wei Wang, Yuanxu Chen

Scene Text Recognition (STR), the critical step in OCR systems, has attracted much attention in computer vision. Recent research on modeling textual semantics with Language Model (LM) has witnessed remarkable progress. However, LM only optimizes the joint probability of the estimated characters generated from the Vision Model (VM) in a single language modality, ignoring the visual-semantic relations

更新日期：2024-03-28

详情收藏

Joint Distortion Restoration and Quality Feature Learning for No-reference Image Quality Assessment

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27
Jifan Yang, Zhongyuan Wang, Baojin Huang, Jiaxin Ai, Yuhong Yang, Zixiang Xiong

No-reference image quality assessment (NR-IQA) methods, inspired by the free energy principle, improve the accuracy of image quality prediction by simulating the human brain’s repair process for distorted images. However, existing methods use separate optimization schemes for distortion restoration and quality prediction, which undermines the accurate mapping of feature representations to quality scores

更新日期：2024-03-28

详情收藏

Scene Graph Lossless Compression with Adaptive Prediction for Objects and Relations

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27
Weiyao Lin, Yufeng Zhang, Wenrui Dai, Huabin Liu, John See, Hongkai Xiong

The scene graph is a novel data structure describing objects and their pairwise relationship within image scenes. As the size of scene graphs in vision and multimedia applications increases, the need for lossless storage and transmission of such data becomes more critical. However, the compression of scene graphs is less studied because of the complicated data structures involved and complex distributions

更新日期：2024-03-28

详情收藏

Real-time Attentive Dilated U-Net for Extremely Dark Image Enhancement

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-27
Junjian Huang, Hao Ren, Shulin Liu, Yong Liu, Chuanlu Lv, Jiawen Lu, Changyong Xie, Hong Lu

Images taken under low-light conditions suffer from poor visibility, color distortion and graininess, all of which degrade the image quality and hamper the performance of downstream vision tasks, such as object detection and instance segmentation in the field of autonomous driving, making low-light enhancement an indispensable basic component of high-level visual tasks. Low-light enhancement aims to

更新日期：2024-03-27

详情收藏

Temporal Scene Montage for Self-Supervised Video Scene Boundary Detection

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-26
Jiawei Tan, Pingan Yang, Lu Chen, Hongxing Wang

Once a video sequence is organized as basic shot units, it is of great interest to temporally link shots into semantic-compact scene segments to facilitate long video understanding. However, it still challenges existing video scene boundary detection methods to handle various visual semantics and complex shot relations in video scenes. We proposed a novel self-supervised learning method, Video Scene

更新日期：2024-03-26

详情收藏

Discriminative Segment Focus Network for Fine-grained Video Action Recognition

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-26
Baoli Sun, Xinchen Ye, Tiantian Yan, Zhihui Wang, Haojie Li, Zhiyong Wang

Fine-grained video action recognition aims to identify minor and discriminative variations among fine categories of actions. While many recent action recognition methods have been proposed to better model spatio-temporal representations, how to model the interactions among discriminative atomic actions to effectively characterize inter-class and intra-class variations has been neglected, which is vital

更新日期：2024-03-26

详情收藏

Efficient Brain Tumor Segmentation with Lightweight Separable Spatial Convolutional Network

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-23
Hao Zhang, Meng Liu, Yuan Qi, Yang Ning, Shunbo Hu, Liqiang Nie, Wenyin Zhang

Accurate and automated segmentation of lesions in brain MRI scans is crucial in diagnostics and treatment planning. Despite the significant achievements of existing approaches, they often require substantial computational resources and fail to fully exploit the synergy between low-level and high-level features. To address these challenges, we introduce the Separable Spatial Convolutional Network (SSCN)

更新日期：2024-03-23

详情收藏

Building Category Graphs Representation with Spatial and Temporal Attention for Visual Navigation

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-22
Xiaobo Hu, Youfang Lin, HeHe Fan, Shuo Wang, Zhihao Wu, Kai Lv

Given an object of interest, visual navigation aims to reach the object’s location based on a sequence of partial observations. To this end, an agent needs to 1) acquire specific knowledge about the relations of object categories in the world during training and 2) locate the target object based on the pre-learned object category relations and its trajectory in the current unseen environment. In this

更新日期：2024-03-23

详情收藏

Recoverable Privacy-Preserving Image Classification through Noise-like Adversarial Examples

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-21
Jun Liu, Jiantao Zhou, Jinyu Tian, Weiwei Sun

With the increasing prevalence of cloud computing platforms, ensuring data privacy during the cloud-based image-related services such as classification has become crucial. In this study, we propose a novel privacy-preserving image classification scheme that enables the direct application of classifiers trained in the plaintext domain to classify encrypted images, without the need of retraining a dedicated

更新日期：2024-03-23

详情收藏

New Metrics and Dataset for Biological Development Video Generation

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-20
P. Celard, E. L. Iglesias, J. M. Sorribes-Fdez, L. Borrajo, A. Seara Vieira

Image generative models have advanced in many areas to produce synthetic images of high resolution and detail. This success has enabled its use in the biomedical field, paving the way for the generation of videos showing the biological evolution of its content. Despite the power of generative video models, their use has not yet extended to time-based development, focusing almost exclusively on generating

更新日期：2024-03-20

详情收藏

Feature Extraction Matters More: An Effective and Efficient Universal Deepfake Disruptor

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-20
Long Tang, Dengpan Ye, Zhenhao Lu, Yunming Zhang, Chuanxi Chen

Face manipulation can modify a victim’s facial attributes, e.g., age or hair color, in an image, which is an important component of DeepFakes. Adversarial examples are an emerging approach to combat the threat of visual misinformation to society. To efficiently protect facial images from being forged, designing a universal face anti-manipulation disruptor is essential. However, existing works treat

更新日期：2024-03-20

详情收藏

Make Partition Fit Task: A Novel Framework for Joint Learning of City Region Partition and Representation

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-17
Mingyu Deng, Wanyi Zhang, Jie Zhao, Zhu Wang, Mingliang Zhou, Jun Luo, Chao Chen

The proliferation of multimodal big data in cities provides unprecedented opportunities for modeling and forecasting urban problems, e.g., crime prediction and house price prediction, through data-driven approaches. A fundamental and critical issue in modeling and forecasting urban problems lies in identifying suitable spatial analysis units, also known as city region partition. Existing works rely

更新日期：2024-03-19

详情收藏

Generative Adversarial Networks with Learnable Auxiliary Module for Image Synthesis

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-17
Yan Gan, Chenxue Yang, Mao Ye, Renjie Huang, Deqiang Ouyang

Training generative adversarial networks (GANs) for noise-to-image synthesis is a challenge task, primarily due to the instability of GANs’ training process. One of the key issues is the generator’s sensitivity to input data, which can cause sudden fluctuations in the generator’s loss value with certain inputs. This sensitivity suggests an inadequate ability to resist disturbances in the generator

更新日期：2024-03-18

详情收藏

Multi-Agent DRL-based Multipath Scheduling for Video Streaming with QUIC

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-15
Xueqiang Han, Biao Han, Jinrong Li, Congxi Song

The popularization of video streaming brings challenges in satisfying diverse Quality of Service (QoS) requirements. The multipath extension of the Quick UDP Internet Connection (QUIC) protocol, also called MPQUIC, has the potential to improve video streaming performance with multiple simultaneously transmitting paths. The multipath scheduler of MPQUIC determines how to distribute the packets onto

更新日期：2024-03-15

详情收藏

Realizing Efficient On-Device Language-based Image Retrieval

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-15
Zhiming Hu, Mete Kemertas, Lan Xiao, Caleb Phillips, Iqbal Mohomed, Afsaneh Fazly

Advances in deep learning have enabled accurate language-based search and retrieval, e.g., over user photos, in the cloud. Many users prefer to store their photos in the home due to privacy concerns. As such, a need arises for models that can perform cross-modal search on resource-limited devices. State-of-the-art cross-modal retrieval models achieve high accuracy through learning entangled representations

更新日期：2024-03-15

详情收藏

Invisible Adversarial Watermarking: A Novel Security Mechanism for Enhancing Copyright Protection

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-14
Jinwei Wang, Haihua Wang, Jiawei Zhang, Hao Wu, Xiangyang Luo, Bin Ma

Invisible watermarking can be used as an important tool for copyright certification in the Metaverse. However, with the advent of deep learning, Deep Neural Networks (DNNs) have posed new threats to this technique. For example, artificially trained DNNs can perform unauthorized content analysis and achieve illegal access to protected images. Furthermore, some specially crafted DNNs may even erase invisible

更新日期：2024-03-14

详情收藏

Audio-Visual Contrastive Pre-train for Face Forgery Detection

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-13
Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Weiming Zhang, Ying Guo, Zhen Cheng, Pengfei Yan, Nenghai Yu

The highly realistic avatar in the metaverse may lead to severe leakage of facial privacy. Malicious users can more easily obtain the 3D structure of faces, thus using Deepfake technology to create counterfeit videos with higher realism. To automatically discern facial videos forged with the advancing generation techniques, deepfake detectors need to achieve stronger generalization abilities. Inspired

更新日期：2024-03-13

详情收藏

Automatic Lyric Transcription and Automatic Music Transcription from Multimodal Singing

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-12
Xiangming Gu, Longshen Ou, Wei Zeng, Jianan Zhang, Nicholas Wong, Ye Wang

Automatic lyric transcription (ALT) refers to transcribing singing voices into lyrics while automatic music transcription (AMT) refers to transcribing singing voices into note events, i.e., musical MIDI notes. Despite these two tasks having significant potential for practical application, they are still nascent. This is because the transcription of lyrics and note events solely from singing audio is

更新日期：2024-03-13

详情收藏

Suitable and Style-consistent Multi-texture Recommendation for Cartoon Illustrations

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-12
Huisi Wu, Zhaoze Wang, Yifan Li, Xueting Liu, Tong-Yee Lee

Texture plays an important role in cartoon illustrations to display object materials and enrich visual experiences. Unfortunately, manually designing and drawing an appropriate texture is not easy even for proficient artists, let alone novice or amateur people. While there exist tons of textures on the Internet, it is not easy to pick an appropriate one using traditional text-based search engines.

更新日期：2024-03-12

详情收藏

Mastering Deepfake Detection: A Cutting-Edge Approach to Distinguish GAN and Diffusion-Model Images

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-09
Luca Guarnera, Oliver Giudice, Sebastiano Battiato

Detecting and recognizing deepfakes is a pressing issue in the digital age. In this study, we first collected a dataset of pristine images and fake ones properly generated by nine different Generative Adversarial Network (GAN) architectures and four Diffusion Models (DM). The dataset contained a total of 83,000 images, with equal distribution between the real and deepfake data. Then, to address different

更新日期：2024-03-09

详情收藏

Backdoor Two-Stream Video Models on Federated Learning

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-07
Jing Zhao, Hongwei Yang, Hui He, Jie Peng, Weizhe Zhang, Jiangqun Ni, Arun Kumar Sangaiah, Aniello Castiglione

Video models on federated learning (FL) enable continual learning of the involved models for video tasks on end-user devices while protecting the privacy of end-user data. As a result, the security issues on FL, e.g., the backdoor attacks on FL and their defense have increasingly becoming the domains of extensive research in recent years. The backdoor attacks on FL are a class of poisoning attacks

更新日期：2024-03-08

详情收藏

Review and Analysis of RGBT Single Object Tracking Methods: A Fusion Perspective

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-07
ZhiHao Zhang, Jun Wang, Zhuli Zang, Lei Jin, Shengjie Li, Hao Wu, Jian Zhao, Zhang Bo

Visual tracking is a fundamental task in computer vision with significant practical applications in various domains, including surveillance, security, robotics, and human-computer interaction. However, it may face limitations in visible light data, such as low-light environments, occlusion, and camouflage, which can significantly reduce its accuracy. To cope with these challenges, researchers have

更新日期：2024-03-08

详情收藏

A Bitcoin-based Secure Outsourcing Scheme for Optimization Problem in Multimedia Internet of Things

ACM Trans. Multimed. Comput. Commun. Appl. (IF 5.1) Pub Date : 2024-03-08
Wenyuan Yang, Shaocong Wu, Jianwei Fei, Xianwang Zeng, Yuemin Ding, Zhihua Xia

With the development of the Internet of Things (IoT) and cloud computing, various multimedia data such as audio, video, and images have experienced explosive growth, ushering in the era of big data. Large-scale computing tasks in the Multimedia Internet of Things (M-IoT), such as mathematical optimization problems, have begun to be outsourced from IoT devices with limited computing power to cloud servers

更新日期：2024-03-08

详情收藏