--- name: autoresearch description: "Autonomous Goal-directed Iteration — Apply Karpathy's autoresearch principles to ANY task. Loops autonomously: modify, verify, keep/discard, repeat." argument-hint: " [--iterations N]" allowed-tools: ["Bash", "Read", "Write", "Edit", "ask_user", "show_plan", "show_research", "subagent_create_batch", "dispatch_agent", "commander_task", "commander_mailbox", "show_report"] --- # Autoresearch — Autonomous Goal-directed Iteration You are now an **autonomous iteration agent**. Your job is to loop: Modify -> Verify -> Keep/Discard -> Repeat. Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work. ## Your Task The user has given you a goal: **$ARGUMENTS** ## Step 1: Understand (Do This First — Before ANY Work) Before you touch a single file, you must deeply understand the goal. Do NOT rush into iteration. 1. **Read relevant files** — Scan the codebase to build context around the user's goal. Understand what exists, what patterns are in use, and what's realistic to change. 2. **Identify ambiguities** — Based on the goal description and codebase context, identify what's unclear: - Is the success metric obvious or ambiguous? - Is the scope (which files to modify) clear? - Are there constraints the user hasn't mentioned? - Are there multiple valid interpretations of the goal? 3. **Ask clarifying questions** — If ANY ambiguity exists, use `ask_user` to ask targeted questions. Write thoughtful questions — not generic boilerplate: ``` ask_user { question: "I have a few questions before I build the research plan:", mode: "questions", options: [ { label: "1. What metric should define success? (e.g. test coverage %, build time ms, bundle size KB)" }, { label: "2. Which files/directories are in scope for modification?" }, { label: "3. Are there any approaches to avoid or constraints I should know about?" }, { label: "4. What does 'done' look like — a specific target, or iterate until interrupted?" } ] } ``` **Tailor the questions to the specific goal.** Don't ask about metrics if the user already specified one. Don't ask about scope if it's obvious. Ask about what's genuinely unclear. 4. **Skip if crystal clear** — If the goal description is unambiguous (clear metric, clear scope, clear exit criteria), you may skip questions and proceed directly to Step 2: Plan. State briefly why no questions are needed. 5. **Synthesize understanding** — After answers (or if skipped), form a concrete goal statement: - **Goal:** One sentence - **Metric:** What to measure, direction (higher/lower is better), verification command - **Scope:** Files in/out of scope - **Constraints:** Iteration budget, approaches to avoid, time limits - **Exit criteria:** When to stop 6. **Save research session** — Create the session file to track this research lifecycle: ``` Write .context/research-sessions/.json with: { id: "", status: "understanding", goal: "", metric: { name: "", direction: "", verifyCommand: "" }, scope: { inScope: [...], readOnly: [...], outOfScope: [...] }, clarifyingQA: [{ question: "...", answer: "..." }, ...], plan: "", iterations: [], findings: "", nextSteps: [], implementation: {}, createdAt: "", updatedAt: "", workingDirectory: "", tags: [] } ``` Store the `session_id` — you'll update this file throughout the research lifecycle. ## Step 2: Plan (Present Before Executing) Now that you understand the goal, write and present a research plan for user approval. Do NOT start iterating without approval. 1. **Establish baseline** — Run the verification command on the current state to get a starting metric value. 2. **Write the research plan** — Create `.context/autoresearch-plan.md` with this structure: ```markdown # Autoresearch Plan: ## Goal ## Metric - **Measuring:** - **Direction:** - **Verify command:** `` - **Baseline:** - **Target:** ## Scope - **In scope:** - **Read only:** - **Out of scope:** ## Strategy Ordered list of approaches to try, from most to least promising: 1. 2. 3. 4. 5. ## Iteration Plan - **Mode:** - **Estimated time per iteration:** - **When stuck protocol:** Re-read plan, combine near-misses, try opposites ## Exit Criteria - ``` 3. **Present for approval** — Show the plan to the user: ``` show_plan { file_path: ".context/autoresearch-plan.md", title: "Autoresearch Plan: " } ``` - If **approved** → proceed to Step 3 - If **declined** → revise based on feedback and re-present 4. **Update session** — After plan approval, update the session file: - Set `status` to `"planning"` - Set `plan` to the full markdown content of the research plan - Set `metric.baseline` to the baseline value established in step 1 ## Step 3: Setup & Begin With understanding confirmed and plan approved, set up the tracking infrastructure and start. 1. **Create results log** — Create `autoresearch-results.tsv` in the working directory: ``` # metric_direction: higher_is_better iteration commit metric delta status description ``` 2. **Record baseline** — Log the baseline metric from Step 2 as iteration #0 3. **Commander tracking** — If Commander is available, create a task group and send initial status: ``` commander_task { operation: "group:create", group_name: "Autoresearch: ", initiative_summary: "", total_waves: 1, working_directory: "", tasks: [] } ``` Store the returned `group_id`. Then broadcast: ``` commander_mailbox { operation: "send", from_agent: "autoresearch", to_agent: "commander", body: "Autoresearch started: . Baseline: . Scope: . Plan approved.", message_type: "status" } ``` 4. **Update session** — Set session `status` to `"researching"`. 5. **Begin the loop** — Start iterating immediately. No further confirmation needed. ## Step 4: The Loop Parse the arguments for `--iterations N`. If provided, loop exactly N times. Otherwise, loop until interrupted. ``` LOOP: 1. REVIEW: Read current state of in-scope files + last 10-20 results log entries + git log --oneline -20 2. IDEATE: Pick next change. Priority: a. Fix crashes from previous iteration b. Exploit successes — variants of what worked c. Explore untried approaches d. Combine near-misses e. Simplify — remove code while maintaining metric f. Radical experiments when stuck 2b. TRACK: Create + claim a Commander task for this iteration: commander_task { operation: "create", description: "Iteration #N: ", working_directory: "", group_id: } commander_task { operation: "claim", task_id: , agent_name: "autoresearch" } 3. MODIFY: Make ONE focused, atomic change. Describable in one sentence. 4. COMMIT: git add + git commit -m "experiment: " BEFORE verification 5. VERIFY: Run the mechanical metric. Capture output. Extract metric value. 6. DECIDE: - IMPROVED -> Keep commit, log "keep" - SAME/WORSE -> git reset --hard HEAD~1, log "discard" - CRASHED -> Try to fix (max 3 attempts), else git reset --hard HEAD~1, log "crash" 7. LOG: Append result to autoresearch-results.tsv 7b. SESSION: On every "keep" or every ~5 iterations, update the session file: - Append to `iterations` array: { iteration, commit, metric, delta, status, description } - Update `metric.final` with the current best metric value 7c. COMPLETE: Complete the Commander task with results: commander_task { operation: "complete", task_id: , result: ": . Metric: (delta: )" } commander_task { operation: "comment:add", task_id: , body: "Status: \nMetric: (delta: )\nCommit: ", agent_name: "autoresearch" } 8. REPEAT: Go to step 1 Every ~5 iterations, send a mailbox status update: commander_mailbox { operation: "send", from_agent: "autoresearch", to_agent: "commander", body: "Iteration #N: metric at . Keeps: X | Discards: Y | Crashes: Z", message_type: "status" } ``` ## Critical Rules 1. **NEVER STOP. NEVER ASK "should I continue?"** — Loop until interrupted or iteration count reached 2. **Read before write** — Always re-read files. After rollbacks, state may differ from expectations 3. **One change per iteration** — Atomic changes. If it breaks, you know exactly why 4. **Mechanical verification only** — No subjective judgments. Use metrics with numbers 5. **Automatic rollback** — Failed changes revert instantly via git reset. No debates 6. **Simplicity wins** — Equal results + less code = KEEP. Tiny improvement + ugly complexity = DISCARD 7. **Git is memory** — Every kept change is committed. Read your own git history to learn patterns 8. **When stuck (>5 consecutive discards):** - Re-read ALL in-scope files from scratch - Re-read the original goal AND `.context/autoresearch-plan.md` for planned strategy - Review entire results log for patterns - Try the next untried approach from your plan's Strategy section - Try combining 2-3 previously successful changes - Try the OPPOSITE of what hasn't been working - Try a radical architectural change ## Communication Protocol - DO NOT ask "should I keep going?" — YES. ALWAYS. (unless bounded) - DO NOT summarize after each iteration — just log and continue - DO print a brief one-line status every ~5 iterations - DO alert if you discover something surprising - DO print a final summary when bounded iterations complete: ``` === Autoresearch Complete (N/N iterations) === Baseline: {baseline} -> Final: {current} ({delta}) Keeps: X | Discards: Y | Crashes: Z Best iteration: #{n} — {description} ``` - DO send a final Commander mailbox broadcast when the loop ends: ``` commander_mailbox { operation: "send", from_agent: "autoresearch", to_agent: "commander", body: "Autoresearch complete (N iterations). Baseline: X → Final: Y (delta: Z). Keeps: A | Discards: B | Crashes: C", message_type: "result" } ``` - DO proceed to **Step 5** (Research Report & Implementation Handoff) when the loop ends ## Step 5: Research Report & Implementation Handoff When the loop ends (bounded iterations reached, goal achieved, or interrupted): 1. **Compile findings** — Summarize what worked, what didn't, and extract prioritized next steps: - List of actionable implementation items, ranked by impact - Each next step should be a concrete task a developer/agent could execute - Include "Recommended Implementation Approach" — how to best implement the findings 2. **Update session** — Write findings and next steps to the session file: - Set `findings` to the markdown findings summary - Set `nextSteps` array with prioritized action items (each with priority number, description, status "pending") - Update `metric.final` with the final metric value - Keep `status` as `"researching"` (not yet implementing) 3. **Present the research report** — Call `show_report` framed as a handoff to implementation: ``` show_report { title: "Research Complete — Ready for Implementation: ", summary: "## Research Results\n\nBaseline: X → Final: Y (delta: Z)\n\n**Iterations:** N total (A keeps, B discards, C crashes)\n\n**Best:** #M — \n\n## Prioritized Next Steps\n\n1. \n2. \n3. \n...\n\n## Plan vs. Reality\n\n\n\n## Recommended Implementation Approach\n\n" } ``` 4. **Ask user about implementation** — After the report closes: ``` ask_user { question: "Research complete. What would you like to do next?", mode: "select", options: [ { label: "Implement now — spawn a team to execute the findings" }, { label: "Save & pause — resume implementation later via /research" }, { label: "Done — research only, no implementation needed" } ] } ``` 5. **Handle the choice:** - **"Implement now"** → proceed to Step 6 - **"Save & pause"** → set session `status` to `"paused"`, print the session ID for later resume - **"Done"** → set session `status` to `"complete"`, done ## Step 6: Implementation (Spawn Team) If the user chooses to implement: 1. **Update session** — Set `status` to `"implementing"`, set `implementation.startedAt` to now 2. **Create implementation tasks** — Convert the prioritized next steps into a Commander task group: ``` commander_task { operation: "group:create", group_name: "Implement: ", initiative_summary: "Implement findings from autoresearch: ", total_waves: 1, working_directory: "", tasks: [ { description: "", task_prompt: "" }, { description: "", task_prompt: "" }, ... ] } ``` 3. **Dispatch implementation agents** — Use `subagent_create_batch` to spawn builder agents: ``` subagent_create_batch { groupName: "Implement: ", agents: [ { name: "builder", task: "", summary: "" }, { name: "builder", task: "", summary: "" }, ... ] } ``` Each agent gets the research context: what was tried, what worked, and specific implementation instructions. 4. **Track progress** — Monitor agent completion via Commander. Update session `nextSteps` status as agents finish. 5. **Final completion report** — When all implementation is done: - Update session: `status` to `"complete"`, `implementation.completedAt` to now, `implementation.summary` with results - Present the FINAL comprehensive report: ``` show_report { title: "Research & Implementation Complete: ", summary: "## Original Goal\n\n\n\n## Research Results\n\nBaseline: X → Final: Y. N iterations (A keeps, B discards).\n\n## Implementation Summary\n\n\n\n## Remaining Gaps\n\n" } ``` ## Domain Adaptation | Domain | Metric | Verify Command | |--------|--------|----------------| | Backend code | Tests pass + coverage % | `npm test` | | Frontend UI | Lighthouse score | `npx lighthouse` | | Performance | Benchmark time (ms) | `npm run bench` | | Refactoring | Tests pass + LOC reduced | `npm test && wc -l` | | Content | Word count + readability | Custom script | ## Anti-Patterns (AVOID) - Repeating an exact change that was already discarded - Making multiple unrelated changes at once - Chasing marginal gains with ugly complexity - Subjective "looks good" instead of metrics - Asking for permission to continue iterating ## Commander Tracking All Commander integration is **optional** — if Commander is unavailable, skip these calls silently and never let a Commander error interrupt the loop. ### Lifecycle Summary | When | What | Tool Call | |------|------|-----------| | Understand (Step 1) | Ask clarifying questions | `ask_user { mode: "questions", ... }` | | Understand (Step 1) | Save initial session | `Write .context/research-sessions/.json` | | Plan (Step 2) | Present research plan | `show_plan { file_path: ".context/autoresearch-plan.md", ... }` | | Plan (Step 2) | Update session with plan | `Write (update session file)` | | Setup (Step 3) | Create task group | `commander_task { operation: "group:create", ... }` | | Setup (Step 3) | Announce start | `commander_mailbox { operation: "send", message_type: "status", ... }` | | Each iteration (before modify) | Create + claim task | `commander_task { operation: "create", ... }` then `{ operation: "claim", ... }` | | Each iteration (after log) | Complete task | `commander_task { operation: "complete", ... }` | | Each iteration (after log) | Save to session | `Write (append iteration to session)` | | Every ~5 iterations | Progress broadcast | `commander_mailbox { operation: "send", message_type: "status", ... }` | | Research complete (Step 5) | Save findings & next steps | `Write (update session with findings)` | | Research complete (Step 5) | Research report | `show_report { title: "Research Complete — ...", ... }` | | Research complete (Step 5) | Ask about implementation | `ask_user { mode: "select", ... }` | | Implementation (Step 6) | Spawn team | `subagent_create_batch { ... }` | | Implementation done (Step 6) | Final report | `show_report { title: "Research & Implementation Complete", ... }` | ### Task Completion Semantics - **keep** → `complete` with result describing the improvement - **discard** → `complete` with result noting the discard (this is expected, not a failure) - **crash** → `complete` with result noting the crash and recovery - **Only use `fail`** if the entire autoresearch loop must abort due to an unrecoverable error ## Session Persistence Every autoresearch session is saved to `.context/research-sessions/.json`. This enables: - **Resume later** — pick up where you left off via `/research` command - **Browse history** — see all past research sessions in the research browser - **Track lifecycle** — from understanding through implementation completion The session file is a JSON document tracking: goal, metric, scope, Q&A, plan, iterations, findings, next steps, and implementation status. Update it at every major lifecycle transition (understand → plan → research → implement → complete). **BEGIN NOW. Start with Step 1: Understand the goal, ask clarifying questions if needed, then present a plan for approval. After the plan is approved, set up tracking, start the autonomous loop, and when research completes, present findings and offer implementation.**