ProductsarXiv cs.AI — 47 d ago

From Human Guidance to Autonomy: Agent Skill System for End-to-End LLM Deployment on Spatial NPUs

The article presents a two-stage methodology for deploying Llama-3.2-1B and other decoder-only LLMs on AMD's XDNA 2 NPU, transitioning from human-guided development to an autonomous agent skill system. The initial deployment of Llama-3.2-1B achieved a 2.2x speedup on prefill and a 4.0x speedup on decode compared to a hand-optimized baseline. This approach enables the efficient end-to-end deployment of multiple models with minimal human intervention, demonstrating competitive performance and functional generalization, which is significant for practitioners working on optimizing LLMs for edge inference on resource-constrained hardware.

deploymentllmspatial NPUagentrelevance 0.60 · engagement 0.00

Read at source ↗← all news