SEED-MUSIC

Affiliation: TikTok/ByteDance

Role: Senior AI Research Scientist. Core architect of the Audio Tokenizer/Vector Quantization algorithm.

Links:
SEED-MUSIC Homepage 
Technical Paper


SEED-MUSIC is a suite of music generation and editing systems designed to produce high-quality music with fine-grained style control. SEED-MUSIC is a unified framework which leverages both auto-regressive language modeling and diffusion approaches to support two key music creation workflows: controlled music generation and post-production editing. 







SEED-MUSIC supports a wide range of music generation tasks and readers are highly recommended to listen to demos on the official project page. From a high level, the system can do:

Shortform audio generation: Generate 30s-60s tracks with expressive vocals in multiple languages across a range of styles. Generation follows the provided lyrics exactlty.

Longform audio generation: Generate full length tracks which maintain melodic, rhythmic and genre coherence across 3-5 minutes.

Audio Prompting: Users can provide a reference audio track to control the style of generated audio or directly “continue” the track.

Instrumental Music Generation: Instrumental music generation can be generalized as a sub-task of vocal music generation without vocals and lyrics.

Leadsheet Token Generation: Leadsheet tokens are musically interpretable units which contain information about note pitch, note duration, lyric phonemes and instrumentation. Like MIDI, it can be interepreted and directly edited by a musician and has been optimized for use with LLM and diffusion audio workflows. Instead of directly generating from language descriptions, SEED-MUSIC can generate music defined explicitly by by leadsheet tokens. 

Non-destructive Audio Post-Processing and Editing: A full mix of music can be post-processed in two ways - change the lyrics without altering the melody and changing the melody without altering the lyrics. Other elements of the full mix are left unaltered.