SWT-Bench

17 Oct 2024

SWT-Bench

We are excited to share SWT-Bench, the first benchmark for reproducing bugs and validating their fixes based on GitHub issue descriptions. We presented SWT-Bench at two ICML workshops and want to thank everyone who stopped by for their interest, enthusiasm, and the great discussions we had. We now see a community trend to not only focus on fixing bugs but also generating tests that can effectively reproduce them and validate that proposed fixes truly resolve the issues. We believe this is essential for achieving truly autonomous bug fixing, which is what LogicStar delivers.

In our paper, we demonstrate how any code repair benchmark with a known ground truth solution can be transformed into a test generation and issue reproduction benchmark. There, the goal is to create a “reproducing test” that fails on the original codebase and passes after the ground truth fix has been applied. Our analysis shows that Code Agents excel in this task and outperform dedicated LLM-based test generation methods. Leveraging these tests for code repair further allows us to significantly enhance precision. To learn more, please check out our preprint paper.

LogicStar AI builds on top of this research to achieve a truly autonomous bug fixing that you can trust as you trust your top engineers.

Explore All Our Latest News!

ETH AI Center Affiliation

ETH AI Center Affiliation

LogicStar AI Joins the ETH AI Center as an Affiliate! 🚀

March 03, 2025 Read More
Introducing BaxBench

Introducing BaxBench

BaxBench: Can LLMs Generate Secure and Correct Backends?

February 24, 2025 Read More
LogicStar AI raised a $3m round led by Northzone

LogicStar AI raised a $3m round led by Northzone

LogicStar, building the AI agent for fully autonomous application maintenance, raised a $3m round led by Northzone.

February 20, 2025 Read More

Ready to Transform Your
Software Development?

Discover how our AI-driven solutions can streamline your bug resolution process and enhance your team's productivity. Get in touch today! Contact Us to schedule a demo or learn more about our offerings!

Contact Us