当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls
arXiv - CS - Sound Pub Date : 2024-02-14 , DOI: arxiv-2402.09508
Liwei Lin, Gus Xia, Yixiao Zhang, Junyan Jiang

Controllable music generation plays a vital role in human-AI music co-creation. While Large Language Models (LLMs) have shown promise in generating high-quality music, their focus on autoregressive generation limits their utility in music editing tasks. To bridge this gap, we introduce a novel Parameter-Efficient Fine-Tuning (PEFT) method. This approach enables autoregressive language models to seamlessly address music inpainting tasks. Additionally, our PEFT method integrates frame-level content-based controls, facilitating track-conditioned music refinement and score-conditioned music arrangement. We apply this method to fine-tune MusicGen, a leading autoregressive music generation model. Our experiments demonstrate promising results across multiple music editing tasks, offering more flexible controls for future AI-driven music editing tools. A demo page\footnote{\url{https://kikyo-16.github.io/AIR/}.} showcasing our work and source codes\footnote{\url{https://github.com/Kikyo-16/airgen}.} are available online.



可控音乐生成在人类与人工智能音乐共同创作中发挥着至关重要的作用。虽然大型语言模型 (LLM) 在生成高质量音乐方面表现出了希望,但它们对自回归生成的关注限制了它们在音乐编辑任务中的实用性。为了弥补这一差距,我们引入了一种新颖的参数高效微调(PEFT)方法。这种方法使自回归语言模型能够无缝地解决音乐修复任务。此外,我们的 PEFT 方法集成了基于帧级内容的控制,有助于曲目条件的音乐细化和乐谱条件的音乐编排。我们应用这种方法来微调 MusicGen,这是一种领先的自回归音乐生成模型。我们的实验在多个音乐编辑任务中展示了有希望的结果,为未来人工智能驱动的音乐编辑工具提供了更灵活的控制。演示页面\footnote{\url{https://kikyo-16.github.io/AIR/}.}展示我们的工作和源代码\footnote{\url{https://github.com/Kikyo-16/ airgen}.} 可在线获取。