DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance
arXiv - CS - Multimedia Pub Date : 2024-03-20 , DOI: arxiv-2403.13667
Zixuan Wang, Jia Jia, Shikun Sun, Haozhe Wu, Rong Han, Zhenyu Li, Di Tang, Jiaqing Zhou, Jiebo Luo

Choreographers determine what the dances look like, while cameramen determine the final presentation of dances. Recently, various methods and datasets have showcased the feasibility of dance synthesis. However, camera movement synthesis with music and dance remains an unsolved challenging problem due to the scarcity of paired data. Thus, we present DCM, a new multi-modal 3D dataset, which for the first time combines camera movement with dance motion and music audio. This dataset encompasses 108 dance sequences (3.2 hours) of paired dance-camera-music data from the anime community, covering 4 music genres. With this dataset, we uncover that dance camera movement is multifaceted and human-centric, and possesses multiple influencing factors, making dance camera synthesis a more challenging task compared to camera or dance synthesis alone. To overcome these difficulties, we propose DanceCamera3D, a transformer-based diffusion model that incorporates a novel body attention loss and a condition separation strategy. For evaluation, we devise new metrics measuring camera movement quality, diversity, and dancer fidelity. Utilizing these metrics, we conduct extensive experiments on our DCM dataset, providing both quantitative and qualitative evidence showcasing the effectiveness of our DanceCamera3D model. Code and video demos are available at https://github.com/Carmenw1203/DanceCamera3D-Official.

中文翻译：

DanceCamera3D：音乐和舞蹈的 3D 摄像机运动合成

编舞决定舞蹈的样子，而摄影师则决定舞蹈的最终呈现。最近，各种方法和数据集展示了舞蹈合成的可行性。然而，由于配对数据的稀缺，摄像机运动与音乐和舞蹈的合成仍然是一个未解决的具有挑战性的问题。因此，我们提出了 DCM，一种新的多模态 3D 数据集，它首次将相机运动与舞蹈运动和音乐音频结合起来。该数据集包含来自动漫社区的 108 个舞蹈序列（3.2 小时）配对的舞蹈-摄像机-音乐数据，涵盖 4 种音乐流派。通过这个数据集，我们发现舞蹈摄像机的运动是多方面的、以人为中心的，并且具有多种影响因素，使得舞蹈摄像机的合成比单独的摄像机或舞蹈合成更具挑战性。为了克服这些困难，我们提出了 DanceCamera3D，这是一种基于 Transformer 的扩散模型，它结合了新颖的身体注意力损失和条件分离策略。为了进行评估，我们设计了新的衡量摄像机运动质量、多样性和舞者保真度的指标。利用这些指标，我们对 DCM 数据集进行了广泛的实验，提供了定量和定性证据，展示了 DanceCamera3D 模型的有效性。代码和视频演示可在 https://github.com/Carmenw1203/DanceCamera3D-Official 获取。

更新日期：2024-03-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>