SWT-Bench

17 Oct 2024

SWT-Bench

We are excited to share SWT-Bench, the first benchmark for reproducing bugs and validating their fixes based on GitHub issue descriptions. We presented SWT-Bench at two ICML workshops and want to thank everyone who stopped by for their interest, enthusiasm, and the great discussions we had. We now see a community trend to not only focus on fixing bugs but also generating tests that can effectively reproduce them and validate that proposed fixes truly resolve the issues. We believe this is essential for achieving truly autonomous bug fixing, which is what LogicStar delivers.

In our paper, we demonstrate how any code repair benchmark with a known ground truth solution can be transformed into a test generation and issue reproduction benchmark. There, the goal is to create a “reproducing test” that fails on the original codebase and passes after the ground truth fix has been applied. Our analysis shows that Code Agents excel in this task and outperform dedicated LLM-based test generation methods. Leveraging these tests for code repair further allows us to significantly enhance precision. To learn more, please check out our preprint paper.

LogicStar AI builds on top of this research to achieve a truly autonomous bug fixing that you can trust as you trust your top engineers.

Explore All Our Latest News!

Introducing the SWT-Bench Leaderboard!

Introducing the SWT-Bench Leaderboard!

SWT-Bench Benchmarking CodeAgents' Test Generation Capabilities

December 18, 2024 Read More
Agentic AI from INSAIT and ETH Zurich

Agentic AI from INSAIT and ETH Zurich

INSAIT and ETH Zurich Entrepreneurs launch LogicStar AI, a new Agentic AI startup

December 05, 2024 Read More
Jobs

Jobs

We are looking for passionate software engineers to join our team

November 10, 2024 Read More

Ready to Transform Your
Software Development?

Discover how our AI-driven solutions can streamline your bug resolution process and enhance your team's productivity. Get in touch today! Contact Us to schedule a demo or learn more about our offerings!

Contact Us