ResearchOpenAI Blog — 311 d ago

Estimating worst case frontier risks of open weight LLMs

The paper investigates the worst-case frontier risks associated with the open-source model gpt-oss, introducing the concept of malicious fine-tuning (MFT) to assess its maximum capabilities in biology and cybersecurity domains. By examining the implications of MFT, the study highlights potential vulnerabilities and risks in the deployment of open-weight LLMs, underscoring the need for robust safety measures when integrating such models into sensitive applications. This research is crucial for practitioners to understand the security implications of fine-tuning LLMs in high-stakes environments.

gpt-ossmalicious fine-tuningrelevance 0.00 · engagement 0.00

Read at source ↗← all news