ai-digest.dev
last updated 5 h ago
MultimodalarXiv cs.AI 21 h ago

Whisper-GPT -- Continuous Discrete Hybrid Representation Language Models For Speech And Music

WHISPER-GPT is a newly proposed generative large language model that integrates continuous audio representations with discrete tokens, addressing the limitations of context length in high-fidelity generative architectures. By utilizing both spectrograms and discrete acoustic tokens, the model enhances performance metrics such as perplexity and negative log-likelihood for next token prediction in audio tasks. This hybrid approach offers practitioners a more efficient framework for developing applications in generative audio, speech, and music, leveraging the advantages of both continuous and discrete data representations.

speechmusicllmrelevance 0.00 · engagement 0.00
Read at source ↗← all news