ai-digest.dev
last updated 2 h ago
MultimodalarXiv cs.AI 4 d ago

TextHOI-3D: Text-to-3D Hand-Object Interaction via Discrete Multi-View Generation and Joint Mesh Optimization

TextHOI-3D introduces a novel framework for text-to-3D generation focused on hand-object interactions, addressing challenges in preserving language semantics and ensuring geometric accuracy. The system utilizes a compact VQ token space and a CLIP-conditioned visual autoregressive model to generate multi-view observations, which are then optimized to recover a unified hand-object mesh. Benchmark results demonstrate significant improvements in object contact accuracy and penetration volume, highlighting the effectiveness of multi-view visual tokens as an intermediate representation for practitioners working on 3D generative models in AI.

3D generationtext-to-3Dhand-object interactionrelevance 0.00 · engagement 0.00
Read at source ↗← all news
TextHOI-3D: Text-to-3D Hand-Object Interaction via Discrete Multi-View Generation and Joint Mesh Optimization — AI News Digest