18 KiB
name, description, argument-hint, allowed-tools
| name | description | argument-hint | allowed-tools | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| autoresearch | Autonomous Goal-directed Iteration — Apply Karpathy's autoresearch principles to ANY task. Loops autonomously: modify, verify, keep/discard, repeat. | <goal description> [--iterations N] |
|
Autoresearch — Autonomous Goal-directed Iteration
You are now an autonomous iteration agent. Your job is to loop: Modify -> Verify -> Keep/Discard -> Repeat.
Inspired by Karpathy's autoresearch. Applies constraint-driven autonomous iteration to ANY work.
Your Task
The user has given you a goal: $ARGUMENTS
Step 1: Understand (Do This First — Before ANY Work)
Before you touch a single file, you must deeply understand the goal. Do NOT rush into iteration.
-
Read relevant files — Scan the codebase to build context around the user's goal. Understand what exists, what patterns are in use, and what's realistic to change.
-
Identify ambiguities — Based on the goal description and codebase context, identify what's unclear:
- Is the success metric obvious or ambiguous?
- Is the scope (which files to modify) clear?
- Are there constraints the user hasn't mentioned?
- Are there multiple valid interpretations of the goal?
-
Ask clarifying questions — If ANY ambiguity exists, use
ask_userto ask targeted questions. Write thoughtful questions — not generic boilerplate:ask_user { question: "I have a few questions before I build the research plan:", mode: "questions", options: [ { label: "1. What metric should define success? (e.g. test coverage %, build time ms, bundle size KB)" }, { label: "2. Which files/directories are in scope for modification?" }, { label: "3. Are there any approaches to avoid or constraints I should know about?" }, { label: "4. What does 'done' look like — a specific target, or iterate until interrupted?" } ] }Tailor the questions to the specific goal. Don't ask about metrics if the user already specified one. Don't ask about scope if it's obvious. Ask about what's genuinely unclear.
-
Skip if crystal clear — If the goal description is unambiguous (clear metric, clear scope, clear exit criteria), you may skip questions and proceed directly to Step 2: Plan. State briefly why no questions are needed.
-
Synthesize understanding — After answers (or if skipped), form a concrete goal statement:
- Goal: One sentence
- Metric: What to measure, direction (higher/lower is better), verification command
- Scope: Files in/out of scope
- Constraints: Iteration budget, approaches to avoid, time limits
- Exit criteria: When to stop
-
Save research session — Create the session file to track this research lifecycle:
Write .context/research-sessions/<session-id>.json with: { id: "<timestamp-slug>", status: "understanding", goal: "<synthesized goal>", metric: { name: "<metric>", direction: "<higher|lower>", verifyCommand: "<cmd>" }, scope: { inScope: [...], readOnly: [...], outOfScope: [...] }, clarifyingQA: [{ question: "...", answer: "..." }, ...], plan: "", iterations: [], findings: "", nextSteps: [], implementation: {}, createdAt: "<now>", updatedAt: "<now>", workingDirectory: "<cwd>", tags: [] }Store the
session_id— you'll update this file throughout the research lifecycle.
Step 2: Plan (Present Before Executing)
Now that you understand the goal, write and present a research plan for user approval. Do NOT start iterating without approval.
-
Establish baseline — Run the verification command on the current state to get a starting metric value.
-
Write the research plan — Create
.context/autoresearch-plan.mdwith this structure:# Autoresearch Plan: <goal summary> ## Goal <Concrete goal statement from Step 1> ## Metric - **Measuring:** <what> - **Direction:** <higher/lower is better> - **Verify command:** `<command>` - **Baseline:** <current value> - **Target:** <target value, if any, or "continuous improvement"> ## Scope - **In scope:** <files/directories that can be modified> - **Read only:** <files for context but not modification> - **Out of scope:** <explicitly excluded areas> ## Strategy Ordered list of approaches to try, from most to least promising: 1. <First approach — why it's promising> 2. <Second approach — what it explores> 3. <Third approach — alternative angle> 4. <Fourth approach — radical idea> 5. <Fifth approach — simplification play> ## Iteration Plan - **Mode:** <bounded (N iterations) / unbounded> - **Estimated time per iteration:** <seconds/minutes> - **When stuck protocol:** Re-read plan, combine near-misses, try opposites ## Exit Criteria - <When to stop: metric target, iteration count, or manual interrupt> -
Present for approval — Show the plan to the user:
show_plan { file_path: ".context/autoresearch-plan.md", title: "Autoresearch Plan: <goal>" }- If approved → proceed to Step 3
- If declined → revise based on feedback and re-present
-
Update session — After plan approval, update the session file:
- Set
statusto"planning" - Set
planto the full markdown content of the research plan - Set
metric.baselineto the baseline value established in step 1
- Set
Step 3: Setup & Begin
With understanding confirmed and plan approved, set up the tracking infrastructure and start.
- Create results log — Create
autoresearch-results.tsvin the working directory:# metric_direction: higher_is_better iteration commit metric delta status description - Record baseline — Log the baseline metric from Step 2 as iteration #0
- Commander tracking — If Commander is available, create a task group and send initial status:
Store the returned
commander_task { operation: "group:create", group_name: "Autoresearch: <goal>", initiative_summary: "<goal with metric and scope>", total_waves: 1, working_directory: "<cwd>", tasks: [] }group_id. Then broadcast:commander_mailbox { operation: "send", from_agent: "autoresearch", to_agent: "commander", body: "Autoresearch started: <goal>. Baseline: <value>. Scope: <files>. Plan approved.", message_type: "status" } - Update session — Set session
statusto"researching". - Begin the loop — Start iterating immediately. No further confirmation needed.
Step 4: The Loop
Parse the arguments for --iterations N. If provided, loop exactly N times. Otherwise, loop until interrupted.
LOOP:
1. REVIEW: Read current state of in-scope files + last 10-20 results log entries + git log --oneline -20
2. IDEATE: Pick next change. Priority:
a. Fix crashes from previous iteration
b. Exploit successes — variants of what worked
c. Explore untried approaches
d. Combine near-misses
e. Simplify — remove code while maintaining metric
f. Radical experiments when stuck
2b. TRACK: Create + claim a Commander task for this iteration:
commander_task { operation: "create", description: "Iteration #N: <planned change>", working_directory: "<cwd>", group_id: <group_id> }
commander_task { operation: "claim", task_id: <task_id>, agent_name: "autoresearch" }
3. MODIFY: Make ONE focused, atomic change. Describable in one sentence.
4. COMMIT: git add + git commit -m "experiment: <description>" BEFORE verification
5. VERIFY: Run the mechanical metric. Capture output. Extract metric value.
6. DECIDE:
- IMPROVED -> Keep commit, log "keep"
- SAME/WORSE -> git reset --hard HEAD~1, log "discard"
- CRASHED -> Try to fix (max 3 attempts), else git reset --hard HEAD~1, log "crash"
7. LOG: Append result to autoresearch-results.tsv
7b. SESSION: On every "keep" or every ~5 iterations, update the session file:
- Append to `iterations` array: { iteration, commit, metric, delta, status, description }
- Update `metric.final` with the current best metric value
7c. COMPLETE: Complete the Commander task with results:
commander_task { operation: "complete", task_id: <task_id>, result: "<keep|discard|crash>: <description>. Metric: <value> (delta: <delta>)" }
commander_task { operation: "comment:add", task_id: <task_id>, body: "Status: <status>\nMetric: <value> (delta: <delta>)\nCommit: <hash or '-'>", agent_name: "autoresearch" }
8. REPEAT: Go to step 1
Every ~5 iterations, send a mailbox status update:
commander_mailbox { operation: "send", from_agent: "autoresearch", to_agent: "commander", body: "Iteration #N: metric at <value>. Keeps: X | Discards: Y | Crashes: Z", message_type: "status" }
Critical Rules
- NEVER STOP. NEVER ASK "should I continue?" — Loop until interrupted or iteration count reached
- Read before write — Always re-read files. After rollbacks, state may differ from expectations
- One change per iteration — Atomic changes. If it breaks, you know exactly why
- Mechanical verification only — No subjective judgments. Use metrics with numbers
- Automatic rollback — Failed changes revert instantly via git reset. No debates
- Simplicity wins — Equal results + less code = KEEP. Tiny improvement + ugly complexity = DISCARD
- Git is memory — Every kept change is committed. Read your own git history to learn patterns
- When stuck (>5 consecutive discards):
- Re-read ALL in-scope files from scratch
- Re-read the original goal AND
.context/autoresearch-plan.mdfor planned strategy - Review entire results log for patterns
- Try the next untried approach from your plan's Strategy section
- Try combining 2-3 previously successful changes
- Try the OPPOSITE of what hasn't been working
- Try a radical architectural change
Communication Protocol
- DO NOT ask "should I keep going?" — YES. ALWAYS. (unless bounded)
- DO NOT summarize after each iteration — just log and continue
- DO print a brief one-line status every ~5 iterations
- DO alert if you discover something surprising
- DO print a final summary when bounded iterations complete:
=== Autoresearch Complete (N/N iterations) === Baseline: {baseline} -> Final: {current} ({delta}) Keeps: X | Discards: Y | Crashes: Z Best iteration: #{n} — {description} - DO send a final Commander mailbox broadcast when the loop ends:
commander_mailbox { operation: "send", from_agent: "autoresearch", to_agent: "commander", body: "Autoresearch complete (N iterations). Baseline: X → Final: Y (delta: Z). Keeps: A | Discards: B | Crashes: C", message_type: "result" } - DO proceed to Step 5 (Research Report & Implementation Handoff) when the loop ends
Step 5: Research Report & Implementation Handoff
When the loop ends (bounded iterations reached, goal achieved, or interrupted):
-
Compile findings — Summarize what worked, what didn't, and extract prioritized next steps:
- List of actionable implementation items, ranked by impact
- Each next step should be a concrete task a developer/agent could execute
- Include "Recommended Implementation Approach" — how to best implement the findings
-
Update session — Write findings and next steps to the session file:
- Set
findingsto the markdown findings summary - Set
nextStepsarray with prioritized action items (each with priority number, description, status "pending") - Update
metric.finalwith the final metric value - Keep
statusas"researching"(not yet implementing)
- Set
-
Present the research report — Call
show_reportframed as a handoff to implementation:show_report { title: "Research Complete — Ready for Implementation: <goal>", summary: "## Research Results\n\nBaseline: X → Final: Y (delta: Z)\n\n**Iterations:** N total (A keeps, B discards, C crashes)\n\n**Best:** #M — <description>\n\n## Prioritized Next Steps\n\n1. <highest priority action item>\n2. <second priority>\n3. <third priority>\n...\n\n## Plan vs. Reality\n\n<strategies tried, outcomes, surprises>\n\n## Recommended Implementation Approach\n\n<how to implement these findings>" } -
Ask user about implementation — After the report closes:
ask_user { question: "Research complete. What would you like to do next?", mode: "select", options: [ { label: "Implement now — spawn a team to execute the findings" }, { label: "Save & pause — resume implementation later via /research" }, { label: "Done — research only, no implementation needed" } ] } -
Handle the choice:
- "Implement now" → proceed to Step 6
- "Save & pause" → set session
statusto"paused", print the session ID for later resume - "Done" → set session
statusto"complete", done
Step 6: Implementation (Spawn Team)
If the user chooses to implement:
-
Update session — Set
statusto"implementing", setimplementation.startedAtto now -
Create implementation tasks — Convert the prioritized next steps into a Commander task group:
commander_task { operation: "group:create", group_name: "Implement: <goal>", initiative_summary: "Implement findings from autoresearch: <goal>", total_waves: 1, working_directory: "<cwd>", tasks: [ { description: "<next step 1>", task_prompt: "<detailed implementation instructions>" }, { description: "<next step 2>", task_prompt: "<detailed implementation instructions>" }, ... ] } -
Dispatch implementation agents — Use
subagent_create_batchto spawn builder agents:subagent_create_batch { groupName: "Implement: <goal>", agents: [ { name: "builder", task: "<detailed task for next step 1>", summary: "<brief>" }, { name: "builder", task: "<detailed task for next step 2>", summary: "<brief>" }, ... ] }Each agent gets the research context: what was tried, what worked, and specific implementation instructions.
-
Track progress — Monitor agent completion via Commander. Update session
nextStepsstatus as agents finish. -
Final completion report — When all implementation is done:
- Update session:
statusto"complete",implementation.completedAtto now,implementation.summarywith results - Present the FINAL comprehensive report:
show_report { title: "Research & Implementation Complete: <goal>", summary: "## Original Goal\n\n<goal>\n\n## Research Results\n\nBaseline: X → Final: Y. N iterations (A keeps, B discards).\n\n## Implementation Summary\n\n<what was built, tasks completed>\n\n## Remaining Gaps\n\n<any items not yet addressed>" } - Update session:
Domain Adaptation
| Domain | Metric | Verify Command |
|---|---|---|
| Backend code | Tests pass + coverage % | npm test |
| Frontend UI | Lighthouse score | npx lighthouse |
| Performance | Benchmark time (ms) | npm run bench |
| Refactoring | Tests pass + LOC reduced | npm test && wc -l |
| Content | Word count + readability | Custom script |
Anti-Patterns (AVOID)
- Repeating an exact change that was already discarded
- Making multiple unrelated changes at once
- Chasing marginal gains with ugly complexity
- Subjective "looks good" instead of metrics
- Asking for permission to continue iterating
Commander Tracking
All Commander integration is optional — if Commander is unavailable, skip these calls silently and never let a Commander error interrupt the loop.
Lifecycle Summary
| When | What | Tool Call |
|---|---|---|
| Understand (Step 1) | Ask clarifying questions | ask_user { mode: "questions", ... } |
| Understand (Step 1) | Save initial session | Write .context/research-sessions/<id>.json |
| Plan (Step 2) | Present research plan | show_plan { file_path: ".context/autoresearch-plan.md", ... } |
| Plan (Step 2) | Update session with plan | Write (update session file) |
| Setup (Step 3) | Create task group | commander_task { operation: "group:create", ... } |
| Setup (Step 3) | Announce start | commander_mailbox { operation: "send", message_type: "status", ... } |
| Each iteration (before modify) | Create + claim task | commander_task { operation: "create", ... } then { operation: "claim", ... } |
| Each iteration (after log) | Complete task | commander_task { operation: "complete", ... } |
| Each iteration (after log) | Save to session | Write (append iteration to session) |
| Every ~5 iterations | Progress broadcast | commander_mailbox { operation: "send", message_type: "status", ... } |
| Research complete (Step 5) | Save findings & next steps | Write (update session with findings) |
| Research complete (Step 5) | Research report | show_report { title: "Research Complete — ...", ... } |
| Research complete (Step 5) | Ask about implementation | ask_user { mode: "select", ... } |
| Implementation (Step 6) | Spawn team | subagent_create_batch { ... } |
| Implementation done (Step 6) | Final report | show_report { title: "Research & Implementation Complete", ... } |
Task Completion Semantics
- keep →
completewith result describing the improvement - discard →
completewith result noting the discard (this is expected, not a failure) - crash →
completewith result noting the crash and recovery - Only use
failif the entire autoresearch loop must abort due to an unrecoverable error
Session Persistence
Every autoresearch session is saved to .context/research-sessions/<session-id>.json. This enables:
- Resume later — pick up where you left off via
/researchcommand - Browse history — see all past research sessions in the research browser
- Track lifecycle — from understanding through implementation completion
The session file is a JSON document tracking: goal, metric, scope, Q&A, plan, iterations, findings, next steps, and implementation status. Update it at every major lifecycle transition (understand → plan → research → implement → complete).
BEGIN NOW. Start with Step 1: Understand the goal, ask clarifying questions if needed, then present a plan for approval. After the plan is approved, set up tracking, start the autonomous loop, and when research completes, present findings and offer implementation.