Models
A benchmark for tiny LLMs based on a real world problem: natural language file search (using monkeSearch)
A new benchmark for small language models (LLMs) under 3 billion parameters has been introduced through the monkeSearch project, which focuses on natural language file search. The benchmark evaluates models like Gemma-3 (270M), SmolLM2 (360M), and TinyLlama (1.1B) on their ability to parse queries into structured JSON, assessing file type, temporal awareness, and specificity across 80 queries. Initial results indicate that models between 0.8B and 1.5B parameters outperform those below 0.5B, suggesting potential benefits from fine-tuning smaller models for enhanced performance in CPU-inference environments.
llmbenchmarksearch