Summary
Anthropic’s team got in touch with Firefox engineers after using Claude to identify security bugs in our JavaScript engine. Critically, their bug reports included minimal test cases that allowed our security team to quickly verify and reproduce each issue.
Within hours, our platform engineers began landing fixes, and we kicked off a tight collaboration with Anthropic to apply the same technique across the rest of the browser codebase. In total, we discovered 14 high-severity bugs and issued 22 CVEs as a result of this work. All of these bugs are now fixed in the latest version of the browser.
In addition to the 22 security-sensitive bugs, Anthropic discovered 90 other bugs, most of which are now fixed. A number of the lower-severity findings were assertion failures, which overlapped with issues traditionally found through fuzzing, an automated testing technique that feeds software huge numbers of unexpected inputs to trigger crashes and bugs. However, the model also identified distinct classes of logic errors that fuzzers had not previously uncovered.
[...]
The scale of findings reflects the power of combining rigorous engineering with new analysis tools for continuous improvement. We view this as clear evidence that large-scale, AI-assisted analysis is a powerful new addition in security engineers’ toolbox. Firefox has undergone some of the most extensive fuzzing, static analysis, and regular security review over decades. Despite this, the model was able to reveal many previously unknown bugs. This is analogous to the early days of fuzzing; there is likely a substantial backlog of now-discoverable bugs across widely deployed software.