Research

PaperBench

1 articles in archive

We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research.

OpenAI Blog352d ago