Benchmarks - oobo

Overview

We tested whether accumulated engineering context actually helps AI agents perform better. The answer: yes, significantly. Oobo captures context from every commit - not just the code diff, but WHY it was built, what was tried, and what patterns work. This memory becomes available to agents via MCP tools.

Results

60% More Bugs Fixed

Agents with oobo memory resolve 60% more real-world bugs on SWE-bench tasks

75% Win Rate

When memory makes a difference, oobo wins 3 out of 4 contested cases

2.5x More Accurate

On codebase-specific questions, agents answer correctly 2.5x more often with context

Zero False Alarms

When oobo warns about a risky modification, it’s always right - 0% false positive rate

How We Tested

Setup:

Same agent (Claude Sonnet 4) with identical tools in both conditions
Only difference: one gets oobo’s memory context, one doesn’t
Real engineering experiences from 12 major open-source Python repositories
Evaluated on tasks the agent has never seen (strict no-leakage protocol)

Benchmark: SWE-bench - real GitHub issues with verified gold-standard solutions. Evaluation: Full agentic loop - agents iterate with file reading, code search, editing, and bash commands to produce patches. Compared A/B with LLM judge + factual verification.

What This Means

For Engineering Teams

Every commit your team makes becomes searchable intelligence. When an agent encounters something similar months later, oobo provides the context: which files to look at, what patterns worked, what broke last time.

For AI Agent Performance

Agents without memory waste iterations exploring dead ends. With oobo context, agents navigate directly to the relevant code - producing fixes they’d otherwise miss entirely. Near-zero cost overhead.

For Code Quality

Zero false alarms on regression warnings. When oobo surfaces a risk, it’s always based on real past experience - not heuristics or guesses.

Methodology

Parameter	Value
Dataset	SWE-bench (2294 tasks, 12 Python repositories)
Model	Claude Sonnet 4
Embedding	OpenAI text-embedding-3-large
Search	pgvector cosine similarity
Protocol	Strict train/test split, no data leakage

The Oobo Difference

Without oobo: Every agent session starts from zero. It reads files, searches code, tries approaches, hits dead ends, backtracks. With oobo: The agent inherits accumulated knowledge. Past solutions, known pitfalls, architectural decisions - surfaced automatically at the start of each task. Every commit enriches the system. No configuration. No training. No manual tagging.

Anchor Schema Privacy & Security

​Overview

​Results