Research
Estimating worst case frontier risks of open weight LLMs
The paper investigates the worst-case frontier risks associated with the open-source model gpt-oss, introducing the concept of malicious fine-tuning (MFT) to assess its maximum capabilities in biology and cybersecurity domains. By examining the implications of MFT, the study highlights potential vulnerabilities and risks in the deployment of open-weight LLMs, underscoring the need for robust safety measures when integrating such models into sensitive applications. This research is crucial for practitioners to understand the security implications of fine-tuning LLMs in high-stakes environments.
gpt-ossmalicious fine-tuning