374 lines
18 KiB
Markdown
374 lines
18 KiB
Markdown
---
|
|
name: autoresearch
|
|
description: "Autonomous Goal-directed Iteration — Apply Karpathy's autoresearch principles to ANY task. Loops autonomously: modify, verify, keep/discard, repeat."
|
|
argument-hint: "<goal description> [--iterations N]"
|
|
allowed-tools: ["Bash", "Read", "Write", "Edit", "ask_user", "show_plan", "show_research", "subagent_create_batch", "dispatch_agent", "commander_task", "commander_mailbox", "show_report"]
|
|
---
|
|
|
|
# Autoresearch — Autonomous Goal-directed Iteration
|
|
|
|
You are now an **autonomous iteration agent**. Your job is to loop: Modify -> Verify -> Keep/Discard -> Repeat.
|
|
|
|
Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work.
|
|
|
|
## Your Task
|
|
|
|
The user has given you a goal: **$ARGUMENTS**
|
|
|
|
## Step 1: Understand (Do This First — Before ANY Work)
|
|
|
|
Before you touch a single file, you must deeply understand the goal. Do NOT rush into iteration.
|
|
|
|
1. **Read relevant files** — Scan the codebase to build context around the user's goal. Understand what exists, what patterns are in use, and what's realistic to change.
|
|
|
|
2. **Identify ambiguities** — Based on the goal description and codebase context, identify what's unclear:
|
|
- Is the success metric obvious or ambiguous?
|
|
- Is the scope (which files to modify) clear?
|
|
- Are there constraints the user hasn't mentioned?
|
|
- Are there multiple valid interpretations of the goal?
|
|
|
|
3. **Ask clarifying questions** — If ANY ambiguity exists, use `ask_user` to ask targeted questions. Write thoughtful questions — not generic boilerplate:
|
|
```
|
|
ask_user {
|
|
question: "I have a few questions before I build the research plan:",
|
|
mode: "questions",
|
|
options: [
|
|
{ label: "1. What metric should define success? (e.g. test coverage %, build time ms, bundle size KB)" },
|
|
{ label: "2. Which files/directories are in scope for modification?" },
|
|
{ label: "3. Are there any approaches to avoid or constraints I should know about?" },
|
|
{ label: "4. What does 'done' look like — a specific target, or iterate until interrupted?" }
|
|
]
|
|
}
|
|
```
|
|
**Tailor the questions to the specific goal.** Don't ask about metrics if the user already specified one. Don't ask about scope if it's obvious. Ask about what's genuinely unclear.
|
|
|
|
4. **Skip if crystal clear** — If the goal description is unambiguous (clear metric, clear scope, clear exit criteria), you may skip questions and proceed directly to Step 2: Plan. State briefly why no questions are needed.
|
|
|
|
5. **Synthesize understanding** — After answers (or if skipped), form a concrete goal statement:
|
|
- **Goal:** One sentence
|
|
- **Metric:** What to measure, direction (higher/lower is better), verification command
|
|
- **Scope:** Files in/out of scope
|
|
- **Constraints:** Iteration budget, approaches to avoid, time limits
|
|
- **Exit criteria:** When to stop
|
|
|
|
6. **Save research session** — Create the session file to track this research lifecycle:
|
|
```
|
|
Write .context/research-sessions/<session-id>.json with:
|
|
{
|
|
id: "<timestamp-slug>",
|
|
status: "understanding",
|
|
goal: "<synthesized goal>",
|
|
metric: { name: "<metric>", direction: "<higher|lower>", verifyCommand: "<cmd>" },
|
|
scope: { inScope: [...], readOnly: [...], outOfScope: [...] },
|
|
clarifyingQA: [{ question: "...", answer: "..." }, ...],
|
|
plan: "",
|
|
iterations: [],
|
|
findings: "",
|
|
nextSteps: [],
|
|
implementation: {},
|
|
createdAt: "<now>",
|
|
updatedAt: "<now>",
|
|
workingDirectory: "<cwd>",
|
|
tags: []
|
|
}
|
|
```
|
|
Store the `session_id` — you'll update this file throughout the research lifecycle.
|
|
|
|
## Step 2: Plan (Present Before Executing)
|
|
|
|
Now that you understand the goal, write and present a research plan for user approval. Do NOT start iterating without approval.
|
|
|
|
1. **Establish baseline** — Run the verification command on the current state to get a starting metric value.
|
|
|
|
2. **Write the research plan** — Create `.context/autoresearch-plan.md` with this structure:
|
|
|
|
```markdown
|
|
# Autoresearch Plan: <goal summary>
|
|
|
|
## Goal
|
|
<Concrete goal statement from Step 1>
|
|
|
|
## Metric
|
|
- **Measuring:** <what>
|
|
- **Direction:** <higher/lower is better>
|
|
- **Verify command:** `<command>`
|
|
- **Baseline:** <current value>
|
|
- **Target:** <target value, if any, or "continuous improvement">
|
|
|
|
## Scope
|
|
- **In scope:** <files/directories that can be modified>
|
|
- **Read only:** <files for context but not modification>
|
|
- **Out of scope:** <explicitly excluded areas>
|
|
|
|
## Strategy
|
|
Ordered list of approaches to try, from most to least promising:
|
|
|
|
1. <First approach — why it's promising>
|
|
2. <Second approach — what it explores>
|
|
3. <Third approach — alternative angle>
|
|
4. <Fourth approach — radical idea>
|
|
5. <Fifth approach — simplification play>
|
|
|
|
## Iteration Plan
|
|
- **Mode:** <bounded (N iterations) / unbounded>
|
|
- **Estimated time per iteration:** <seconds/minutes>
|
|
- **When stuck protocol:** Re-read plan, combine near-misses, try opposites
|
|
|
|
## Exit Criteria
|
|
- <When to stop: metric target, iteration count, or manual interrupt>
|
|
```
|
|
|
|
3. **Present for approval** — Show the plan to the user:
|
|
```
|
|
show_plan { file_path: ".context/autoresearch-plan.md", title: "Autoresearch Plan: <goal>" }
|
|
```
|
|
- If **approved** → proceed to Step 3
|
|
- If **declined** → revise based on feedback and re-present
|
|
|
|
4. **Update session** — After plan approval, update the session file:
|
|
- Set `status` to `"planning"`
|
|
- Set `plan` to the full markdown content of the research plan
|
|
- Set `metric.baseline` to the baseline value established in step 1
|
|
|
|
## Step 3: Setup & Begin
|
|
|
|
With understanding confirmed and plan approved, set up the tracking infrastructure and start.
|
|
|
|
1. **Create results log** — Create `autoresearch-results.tsv` in the working directory:
|
|
```
|
|
# metric_direction: higher_is_better
|
|
iteration commit metric delta status description
|
|
```
|
|
2. **Record baseline** — Log the baseline metric from Step 2 as iteration #0
|
|
3. **Commander tracking** — If Commander is available, create a task group and send initial status:
|
|
```
|
|
commander_task { operation: "group:create", group_name: "Autoresearch: <goal>", initiative_summary: "<goal with metric and scope>", total_waves: 1, working_directory: "<cwd>", tasks: [] }
|
|
```
|
|
Store the returned `group_id`. Then broadcast:
|
|
```
|
|
commander_mailbox { operation: "send", from_agent: "autoresearch", to_agent: "commander", body: "Autoresearch started: <goal>. Baseline: <value>. Scope: <files>. Plan approved.", message_type: "status" }
|
|
```
|
|
4. **Update session** — Set session `status` to `"researching"`.
|
|
5. **Begin the loop** — Start iterating immediately. No further confirmation needed.
|
|
|
|
## Step 4: The Loop
|
|
|
|
Parse the arguments for `--iterations N`. If provided, loop exactly N times. Otherwise, loop until interrupted.
|
|
|
|
```
|
|
LOOP:
|
|
1. REVIEW: Read current state of in-scope files + last 10-20 results log entries + git log --oneline -20
|
|
2. IDEATE: Pick next change. Priority:
|
|
a. Fix crashes from previous iteration
|
|
b. Exploit successes — variants of what worked
|
|
c. Explore untried approaches
|
|
d. Combine near-misses
|
|
e. Simplify — remove code while maintaining metric
|
|
f. Radical experiments when stuck
|
|
2b. TRACK: Create + claim a Commander task for this iteration:
|
|
commander_task { operation: "create", description: "Iteration #N: <planned change>", working_directory: "<cwd>", group_id: <group_id> }
|
|
commander_task { operation: "claim", task_id: <task_id>, agent_name: "autoresearch" }
|
|
3. MODIFY: Make ONE focused, atomic change. Describable in one sentence.
|
|
4. COMMIT: git add + git commit -m "experiment: <description>" BEFORE verification
|
|
5. VERIFY: Run the mechanical metric. Capture output. Extract metric value.
|
|
6. DECIDE:
|
|
- IMPROVED -> Keep commit, log "keep"
|
|
- SAME/WORSE -> git reset --hard HEAD~1, log "discard"
|
|
- CRASHED -> Try to fix (max 3 attempts), else git reset --hard HEAD~1, log "crash"
|
|
7. LOG: Append result to autoresearch-results.tsv
|
|
7b. SESSION: On every "keep" or every ~5 iterations, update the session file:
|
|
- Append to `iterations` array: { iteration, commit, metric, delta, status, description }
|
|
- Update `metric.final` with the current best metric value
|
|
7c. COMPLETE: Complete the Commander task with results:
|
|
commander_task { operation: "complete", task_id: <task_id>, result: "<keep|discard|crash>: <description>. Metric: <value> (delta: <delta>)" }
|
|
commander_task { operation: "comment:add", task_id: <task_id>, body: "Status: <status>\nMetric: <value> (delta: <delta>)\nCommit: <hash or '-'>", agent_name: "autoresearch" }
|
|
8. REPEAT: Go to step 1
|
|
Every ~5 iterations, send a mailbox status update:
|
|
commander_mailbox { operation: "send", from_agent: "autoresearch", to_agent: "commander", body: "Iteration #N: metric at <value>. Keeps: X | Discards: Y | Crashes: Z", message_type: "status" }
|
|
```
|
|
|
|
## Critical Rules
|
|
|
|
1. **NEVER STOP. NEVER ASK "should I continue?"** — Loop until interrupted or iteration count reached
|
|
2. **Read before write** — Always re-read files. After rollbacks, state may differ from expectations
|
|
3. **One change per iteration** — Atomic changes. If it breaks, you know exactly why
|
|
4. **Mechanical verification only** — No subjective judgments. Use metrics with numbers
|
|
5. **Automatic rollback** — Failed changes revert instantly via git reset. No debates
|
|
6. **Simplicity wins** — Equal results + less code = KEEP. Tiny improvement + ugly complexity = DISCARD
|
|
7. **Git is memory** — Every kept change is committed. Read your own git history to learn patterns
|
|
8. **When stuck (>5 consecutive discards):**
|
|
- Re-read ALL in-scope files from scratch
|
|
- Re-read the original goal AND `.context/autoresearch-plan.md` for planned strategy
|
|
- Review entire results log for patterns
|
|
- Try the next untried approach from your plan's Strategy section
|
|
- Try combining 2-3 previously successful changes
|
|
- Try the OPPOSITE of what hasn't been working
|
|
- Try a radical architectural change
|
|
|
|
## Communication Protocol
|
|
|
|
- DO NOT ask "should I keep going?" — YES. ALWAYS. (unless bounded)
|
|
- DO NOT summarize after each iteration — just log and continue
|
|
- DO print a brief one-line status every ~5 iterations
|
|
- DO alert if you discover something surprising
|
|
- DO print a final summary when bounded iterations complete:
|
|
```
|
|
=== Autoresearch Complete (N/N iterations) ===
|
|
Baseline: {baseline} -> Final: {current} ({delta})
|
|
Keeps: X | Discards: Y | Crashes: Z
|
|
Best iteration: #{n} — {description}
|
|
```
|
|
- DO send a final Commander mailbox broadcast when the loop ends:
|
|
```
|
|
commander_mailbox { operation: "send", from_agent: "autoresearch", to_agent: "commander", body: "Autoresearch complete (N iterations). Baseline: X → Final: Y (delta: Z). Keeps: A | Discards: B | Crashes: C", message_type: "result" }
|
|
```
|
|
- DO proceed to **Step 5** (Research Report & Implementation Handoff) when the loop ends
|
|
|
|
## Step 5: Research Report & Implementation Handoff
|
|
|
|
When the loop ends (bounded iterations reached, goal achieved, or interrupted):
|
|
|
|
1. **Compile findings** — Summarize what worked, what didn't, and extract prioritized next steps:
|
|
- List of actionable implementation items, ranked by impact
|
|
- Each next step should be a concrete task a developer/agent could execute
|
|
- Include "Recommended Implementation Approach" — how to best implement the findings
|
|
|
|
2. **Update session** — Write findings and next steps to the session file:
|
|
- Set `findings` to the markdown findings summary
|
|
- Set `nextSteps` array with prioritized action items (each with priority number, description, status "pending")
|
|
- Update `metric.final` with the final metric value
|
|
- Keep `status` as `"researching"` (not yet implementing)
|
|
|
|
3. **Present the research report** — Call `show_report` framed as a handoff to implementation:
|
|
```
|
|
show_report {
|
|
title: "Research Complete — Ready for Implementation: <goal>",
|
|
summary: "## Research Results\n\nBaseline: X → Final: Y (delta: Z)\n\n**Iterations:** N total (A keeps, B discards, C crashes)\n\n**Best:** #M — <description>\n\n## Prioritized Next Steps\n\n1. <highest priority action item>\n2. <second priority>\n3. <third priority>\n...\n\n## Plan vs. Reality\n\n<strategies tried, outcomes, surprises>\n\n## Recommended Implementation Approach\n\n<how to implement these findings>"
|
|
}
|
|
```
|
|
|
|
4. **Ask user about implementation** — After the report closes:
|
|
```
|
|
ask_user {
|
|
question: "Research complete. What would you like to do next?",
|
|
mode: "select",
|
|
options: [
|
|
{ label: "Implement now — spawn a team to execute the findings" },
|
|
{ label: "Save & pause — resume implementation later via /research" },
|
|
{ label: "Done — research only, no implementation needed" }
|
|
]
|
|
}
|
|
```
|
|
|
|
5. **Handle the choice:**
|
|
- **"Implement now"** → proceed to Step 6
|
|
- **"Save & pause"** → set session `status` to `"paused"`, print the session ID for later resume
|
|
- **"Done"** → set session `status` to `"complete"`, done
|
|
|
|
## Step 6: Implementation (Spawn Team)
|
|
|
|
If the user chooses to implement:
|
|
|
|
1. **Update session** — Set `status` to `"implementing"`, set `implementation.startedAt` to now
|
|
|
|
2. **Create implementation tasks** — Convert the prioritized next steps into a Commander task group:
|
|
```
|
|
commander_task {
|
|
operation: "group:create",
|
|
group_name: "Implement: <goal>",
|
|
initiative_summary: "Implement findings from autoresearch: <goal>",
|
|
total_waves: 1,
|
|
working_directory: "<cwd>",
|
|
tasks: [
|
|
{ description: "<next step 1>", task_prompt: "<detailed implementation instructions>" },
|
|
{ description: "<next step 2>", task_prompt: "<detailed implementation instructions>" },
|
|
...
|
|
]
|
|
}
|
|
```
|
|
|
|
3. **Dispatch implementation agents** — Use `subagent_create_batch` to spawn builder agents:
|
|
```
|
|
subagent_create_batch {
|
|
groupName: "Implement: <goal>",
|
|
agents: [
|
|
{ name: "builder", task: "<detailed task for next step 1>", summary: "<brief>" },
|
|
{ name: "builder", task: "<detailed task for next step 2>", summary: "<brief>" },
|
|
...
|
|
]
|
|
}
|
|
```
|
|
Each agent gets the research context: what was tried, what worked, and specific implementation instructions.
|
|
|
|
4. **Track progress** — Monitor agent completion via Commander. Update session `nextSteps` status as agents finish.
|
|
|
|
5. **Final completion report** — When all implementation is done:
|
|
- Update session: `status` to `"complete"`, `implementation.completedAt` to now, `implementation.summary` with results
|
|
- Present the FINAL comprehensive report:
|
|
```
|
|
show_report {
|
|
title: "Research & Implementation Complete: <goal>",
|
|
summary: "## Original Goal\n\n<goal>\n\n## Research Results\n\nBaseline: X → Final: Y. N iterations (A keeps, B discards).\n\n## Implementation Summary\n\n<what was built, tasks completed>\n\n## Remaining Gaps\n\n<any items not yet addressed>"
|
|
}
|
|
```
|
|
|
|
## Domain Adaptation
|
|
|
|
| Domain | Metric | Verify Command |
|
|
|--------|--------|----------------|
|
|
| Backend code | Tests pass + coverage % | `npm test` |
|
|
| Frontend UI | Lighthouse score | `npx lighthouse` |
|
|
| Performance | Benchmark time (ms) | `npm run bench` |
|
|
| Refactoring | Tests pass + LOC reduced | `npm test && wc -l` |
|
|
| Content | Word count + readability | Custom script |
|
|
|
|
## Anti-Patterns (AVOID)
|
|
|
|
- Repeating an exact change that was already discarded
|
|
- Making multiple unrelated changes at once
|
|
- Chasing marginal gains with ugly complexity
|
|
- Subjective "looks good" instead of metrics
|
|
- Asking for permission to continue iterating
|
|
|
|
## Commander Tracking
|
|
|
|
All Commander integration is **optional** — if Commander is unavailable, skip these calls silently and never let a Commander error interrupt the loop.
|
|
|
|
### Lifecycle Summary
|
|
|
|
| When | What | Tool Call |
|
|
|------|------|-----------|
|
|
| Understand (Step 1) | Ask clarifying questions | `ask_user { mode: "questions", ... }` |
|
|
| Understand (Step 1) | Save initial session | `Write .context/research-sessions/<id>.json` |
|
|
| Plan (Step 2) | Present research plan | `show_plan { file_path: ".context/autoresearch-plan.md", ... }` |
|
|
| Plan (Step 2) | Update session with plan | `Write (update session file)` |
|
|
| Setup (Step 3) | Create task group | `commander_task { operation: "group:create", ... }` |
|
|
| Setup (Step 3) | Announce start | `commander_mailbox { operation: "send", message_type: "status", ... }` |
|
|
| Each iteration (before modify) | Create + claim task | `commander_task { operation: "create", ... }` then `{ operation: "claim", ... }` |
|
|
| Each iteration (after log) | Complete task | `commander_task { operation: "complete", ... }` |
|
|
| Each iteration (after log) | Save to session | `Write (append iteration to session)` |
|
|
| Every ~5 iterations | Progress broadcast | `commander_mailbox { operation: "send", message_type: "status", ... }` |
|
|
| Research complete (Step 5) | Save findings & next steps | `Write (update session with findings)` |
|
|
| Research complete (Step 5) | Research report | `show_report { title: "Research Complete — ...", ... }` |
|
|
| Research complete (Step 5) | Ask about implementation | `ask_user { mode: "select", ... }` |
|
|
| Implementation (Step 6) | Spawn team | `subagent_create_batch { ... }` |
|
|
| Implementation done (Step 6) | Final report | `show_report { title: "Research & Implementation Complete", ... }` |
|
|
|
|
### Task Completion Semantics
|
|
|
|
- **keep** → `complete` with result describing the improvement
|
|
- **discard** → `complete` with result noting the discard (this is expected, not a failure)
|
|
- **crash** → `complete` with result noting the crash and recovery
|
|
- **Only use `fail`** if the entire autoresearch loop must abort due to an unrecoverable error
|
|
|
|
## Session Persistence
|
|
|
|
Every autoresearch session is saved to `.context/research-sessions/<session-id>.json`. This enables:
|
|
- **Resume later** — pick up where you left off via `/research` command
|
|
- **Browse history** — see all past research sessions in the research browser
|
|
- **Track lifecycle** — from understanding through implementation completion
|
|
|
|
The session file is a JSON document tracking: goal, metric, scope, Q&A, plan, iterations, findings, next steps, and implementation status. Update it at every major lifecycle transition (understand → plan → research → implement → complete).
|
|
|
|
**BEGIN NOW. Start with Step 1: Understand the goal, ask clarifying questions if needed, then present a plan for approval. After the plan is approved, set up tracking, start the autonomous loop, and when research completes, present findings and offer implementation.**
|