How we built an autonomous SEO agent that opens its own pull requests

Every SEO audit tool I've used over the past decade has the same fatal flaw: it tells you what's broken, then walks away. You're left with a 47-page PDF, a Slack message from your boss asking 'so what should we do?', and the dawning realization that you'll spend the next two sprints manually fixing schema markup.

We built Stackwise Rank to close that loop. Audits aren't the deliverable — fixes are. Here's how the autonomous agent works under the hood.

The pipeline at 30,000 feet

01Worker process polls Firestore for queued runs every 3 seconds
02Claims a run atomically, sets status to running, starts a heartbeat every 15s
03Dispatches to the appropriate executor (audit, fix, publish, etc.)
04Executor invokes Claude Sonnet 4.6 via the Agent SDK with a tightly-scoped skill
05On detection of a fixable issue, the agent clones the repo, branches, edits files, and opens a PR via the GitHub API
06Result + token usage flushed back to Firestore; user gets email + Slack notification

Why agents, not scripts

Our first prototype used hardcoded heuristics. 'If you find an image without alt text, write alt="{{filename}}" '. This produces terrible results — alt text that says 'IMG_2847.jpg' is worse than no alt text. SEO requires judgment.

The agent reads the surrounding context, looks at neighboring text, infers what the image is showing, and writes meaningful alt text. The same is true for meta descriptions, canonical tags, schema fields, and a dozen other patterns. Judgment is the product. Code is the medium.

Why Claude Sonnet 4.6

We benchmarked GPT-4o, Claude Sonnet, and Gemini 2.5 on a corpus of 200 hand-graded SEO fixes. Claude Sonnet 4.6 produced fixes humans rated 'ship it' on 88% of cases. GPT-4o landed at 71%, Gemini at 63%. The Agent SDK's tool-use ergonomics sealed it.

The skill system

Rather than one monolithic prompt, the agent loads narrow skills on demand. Each skill is a self-contained markdown file describing exactly what it does, what tools it can use, and what success looks like.

typescript

// Each run loads only the skills it needs.
const skills = await loadSkills([
  "seo-analysis",      // crawls and scores
  "schema-markup-generator",
  "meta-tags-optimizer",
  "broken-link-checker",
]);

const agent = new SdkAgentRunner({
  model: process.env.AUDIT_MODEL ?? "claude-sonnet-4-6",
  skills,
  maxUsd: 1.5,        // hard cost ceiling per run
});

await agent.execute(prompt, { onProgress: hb });

The cost ceiling matters. Without it, a runaway agent could rack up $40 in tokens on a single page. We hard-cap every run and abort cleanly if we hit the limit.

Stale run reaping

Workers die. Railway redeploys. Processes OOM. If a run gets claimed and the worker disappears mid-execution, it would sit in 'running' state forever. We solved this with a heartbeat-and-reap pattern:

Every running job writes a heartbeat to Firestore every 15 seconds
Every 3 seconds, the worker scans for runs whose last heartbeat is older than 90 seconds
Stale runs get atomically re-queued for the next available worker
The original worker, if it comes back, sees the version mismatch and aborts

The hardest bug we hit

Two workers claimed the same run because we forgot to use a Firestore transaction. Result: duplicate PRs opened on the customer's repo. Now every claim uses runTransaction(), and we sleep peacefully.

Cost economics

An autonomous audit run with auto-fix averages $0.43 in token costs. We charge $29/mo for Pro with 50 runs included. The math works because most runs hit cache aggressively (prompt caching reduces input cost by 90%), and the marginal user costs us less than $1.50/mo even at full utilization.

$0.43

avg cost per audit + auto-fix run

90%

prompt cache hit rate

<$1.50

marginal cost per Pro user at cap

What's next

The agent currently fixes about 60% of detected issues autonomously. The remaining 40% need human judgment — restructuring page hierarchies, choosing between two valid schema patterns, deciding whether to consolidate or split content. We're working on a 'collaborative mode' where the agent proposes options and the user picks.

If you want to see it in action, you can deploy the agent on your own repo in about 90 seconds. Start free, no credit card.