Research

SWE-Lancer Benchmark

1 articles in archive

Introducing the SWE-Lancer benchmark

Can frontier LLMs earn $1 million from real-world freelance software engineering?

OpenAI Blog395d ago