Poster
in
Affinity Workshop: New In ML
Story2Nanyin: Chordless, Narrative-Driven Heterophonic Multi-Track MIDI Generation
Weixi Zhai
Most recent advances in text-to-music generation rely on chord-based representations rooted in Western tonal logic. While effective for generating harmonically coherent music, such frameworks exhibit limited adaptability to non-Western traditions and restrict users' creative exploration. This reflects a broader representational gap in generative music systems, where culturally diverse structures remain underexplored.We introduce \textbf{Story2Nanyin}, a novel chordless, narrative-driven symbolic generation system inspired by \textit{Nanyin}—a heterophonic musical tradition recognized by UNESCO. Our system generates multi-track MIDI compositions across four instruments (Pipa, Sanxian, Dongxiao, Erxian) directly from narrative text, without relying on harmonic structure. To enable this, we propose \texttt{NanyinTok}, a culturally grounded tokenization scheme that captures pitch, rhythm, playing technique, and instrument-specific articulation patterns.Built upon a BERT–GPT2 encoder–decoder model, Story2Nanyin is fine-tuned using Low-Rank Adaptation (LoRA), enabling efficient transfer from pretrained language models to a low-resource cultural domain. We further enable expert-in-the-loop evaluation through an interactive interface that visualizes instrument roles and melodic structure, and allows inspection of symbolic tokens to verify scale conformity, ornament usage, and heterophonic alignment.Automatic evaluation demonstrates strong performance on cultural authenticity and narrative responsiveness. By reframing heterophony as a reusable symbolic abstraction, Story2Nanyin offers a new direction for culturally inclusive music generation, addressing an overlooked dimension in multimedia research and opening space for post-chordal, narrative-centered design.