SWT-Bench Verified – Best Test Generation at 84%

In a series of posts, we will outline some the core technologies behind LogicStar.

‍

At LogicStar AI, we are building the platform for self-healing software applications, leveraging agentic systems to autonomously identify, reproduce, and fix bugs. This requires rigorous testing and thorough validation of every application behavior to avoid introducing new issues or wasting reviewer time. Therefore, test generation is an area of key importance at LogicStar.

‍

Our vision is to deliver substantial value for commercial applications; rather than flashy AI demos, we design LogicStar to avoid wasting developer time in reviewing partial or almost correct pull requests.

‍

To drive innovation in test generation, we have developed and open-sourced SWT-Bench, also published at NeurIPS 2024. The popular SWE-Bench requires code agents to fix given issues, SWT-bench tests their ability to generate effective tests. This allows us to develop agents that excel at test generation. Within LogicStar, we orchestrate these test and code generation agents that collaboratively produce well-tested patches for every bug we address.

‍

This system allows our agents to score 84% on the SWT-Bench, beating the previous state-of-the-art of 75.8%, held by the OpenHands team. We achieve this performance by combining multiple agents and models, iteratively refining both code and tests. The seamless orchestration of these agents heavily relies on our proprietary technology, including advanced static analysis tools used directly by our agents. As our agents do not rely on Internet access, there is no risk of leaking your source code, secrets, or your customers' data. Instead, our agents leverage advanced code search capabilities, iterative feedback driven by code execution with coverage metrics, and static analysis tools developed by LogicStar for building codebase understanding.

‍

We are rolling out our latest agent advancements with selected design partners who share our vision for self-healing applications and are helping us shape the future of this technology. Their collaboration ensures that our research delivers immediate value for commercial software. If you also believe in this direction and work with Python, JavaScript or TypeScript repositories, we invite you to join sign up here. We will support you through onboarding and ensure full SOC2 compliance.

‍