ProductsarXiv cs.AI — 7 d ago

Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models

Ex-Omni is an open-source model designed to enhance omni-modal large language models (OLLMs) by enabling the generation of speech-accompanied 3D facial animations. It employs a blendshape-aware speech unit generator and a blendshape decoder to decouple semantic reasoning from temporal generation, along with a token-as-query gated fusion (TQGF) mechanism for controlled semantic integration. The model is pre-trained on the InstructS2SF-1200K dataset, which contains 1.2 million samples, and demonstrates improved audio-visual synchronization and reduced face-generation latency compared to traditional cascaded pipelines, making it significant for advancing human-computer interaction in AI applications.

3d animationllmfacial animationrelevance 0.00 · engagement 0.00

Read at source ↗← all news