Agents
An agent that plans with a frontier model but runs most of tokens locally (built it for my own dual-3090 rig)
The article describes the development of a customizable agent designed for dual RTX 3090 systems, which integrates a frontier model for planning while executing most tasks locally. The architecture comprises three tiers: a planner using Codex for decision-making, a local model (Qwen 3.6 27B) for executing tasks, and an optional fallback (Kimi K2.6) for error handling. This setup allows approximately 85-90% of tokens to be processed locally, enhancing efficiency and reducing costs associated with using frontier models, while implementing deterministic validation to ensure task completion accuracy.
planning_agentlocal_modelsfrontier_model