ResearchReddit r/LocalLLaMA — 12 d ago

OpenMythos benchmarks

The OpenMythos benchmarks have been released, showcasing performance across SWE-bench Pro, CyberGym, and cybench. Notably, the Qwen 3.6 27B model achieved a SWE Verified score of 75, reflecting discrepancies with previous versions due to changes in evaluation methods. The OpenMythos model, designed for cybersecurity tasks, demonstrates promising capabilities but indicates potential for further training to enhance performance, making it a relevant tool for practitioners in the AI security domain.

openmythosbenchmarksrelevance 0.00 · engagement 0.00

Read at source ↗← all news