Poster
Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation
Tingyu Zhu · Haoyu Liu · Ziyu Wang · Zhimin Jiang · Zeyu Zheng
East Exhibition Hall A-B #E-3408
Developing generative models to create or conditionally create symbolic music presents unique challenges due to the combination of limited data availability and the need for high precision in note pitch. To address these challenges, we introduce an efficient Fine-Grained Guidance (FGG) approach within diffusion models. FGG guides the diffusion models to generate music that aligns more closely with the control and intent of expert composers, which is critical to improve the accuracy, listenability, and quality of generated music. This approach empowers diffusion models to excel in advanced applications such as improvisation, and interactive music creation. We derive theoretical characterizations for both the challenges in symbolic music generation and the effects of the FGG approach. We provide numerical experiments and subjective evaluation to demonstrate the effectiveness of our approach. We have published a demo page to showcase performances, which enables real-time interactive generation.
Creating music with AI is challenging because it requires both creativity and precision. A single wrong note in a melody can ruin the entire piece — much like a typo can change the meaning of a sentence. Yet most existing AI music systems can’t reliably follow detailed instructions, especially when the music data is limited.Our research introduces a new technique called Fine-Grained Guidance based on Diffusion Models that helps AI models generate symbolic music (like sheet music or MIDI) more accurately and in real-time. This method ensures that each note fits correctly within the intended harmony and rhythm, just like a skilled musician would do.We tested our method with a dataset of pop music and showed that it produces better-sounding accompaniments than previous systems. Our approach also allows users — even those with limited technical skill — to guide the music generation interactively by specifying chords or melodies.This makes AI music tools more useful for composers, hobbyists, and educators, and opens the door to more intuitive collaboration between humans and machines in creative fields.