OpenAI launches SWE-bench Verified
DIYuan | 2024-08-15 17:12
【數(shù)據(jù)猿導(dǎo)讀】 OpenAI launches SWE-bench Verified

On August 15, OpenAI introduced a more reliable code generation evaluation benchmark: SWE-bench Verified. The most important line on the company's blog is: "As our systems get closer to AGI, we need to evaluate them in increasingly challenging tasks." The benchmark is an improved version (subset) of the existing SWE-bench, designed to more reliably evaluate the ability of AI models to solve real-world software problems.
來源:DIYuan
聲明:數(shù)據(jù)猿尊重媒體行業(yè)規(guī)范,相關(guān)內(nèi)容都會注明來源與作者;轉(zhuǎn)載我們原創(chuàng)內(nèi)容時,也請務(wù)必注明“來源:數(shù)據(jù)猿”與作者名稱,否則將會受到數(shù)據(jù)猿追責(zé)。