Research
Model Evaluation
2 articles in archive
OpenAI Pioneers Program
Advancing model performance and real world evaluation in applied domains.
OpenAI Blog345d ago
Introducing SWE-bench Verified
We’re releasing a human-validated subset of SWE-bench that more reliably evaluates AI models’ ability to solve real-world software issues.
OpenAI Blog584d ago
