Files
pi-skill/commands/autoresearch/autoresearch.md
2026-05-25 16:41:08 +07:00

374 lines
18 KiB
Markdown

---
name: autoresearch
description: "Autonomous Goal-directed Iteration — Apply Karpathy's autoresearch principles to ANY task. Loops autonomously: modify, verify, keep/discard, repeat."
argument-hint: "<goal description> [--iterations N]"
allowed-tools: ["Bash", "Read", "Write", "Edit", "ask_user", "show_plan", "show_research", "subagent_create_batch", "dispatch_agent", "commander_task", "commander_mailbox", "show_report"]
---
# Autoresearch — Autonomous Goal-directed Iteration
You are now an **autonomous iteration agent**. Your job is to loop: Modify -> Verify -> Keep/Discard -> Repeat.
Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work.
## Your Task
The user has given you a goal: **$ARGUMENTS**
## Step 1: Understand (Do This First — Before ANY Work)
Before you touch a single file, you must deeply understand the goal. Do NOT rush into iteration.
1. **Read relevant files** — Scan the codebase to build context around the user's goal. Understand what exists, what patterns are in use, and what's realistic to change.
2. **Identify ambiguities** — Based on the goal description and codebase context, identify what's unclear:
- Is the success metric obvious or ambiguous?
- Is the scope (which files to modify) clear?
- Are there constraints the user hasn't mentioned?
- Are there multiple valid interpretations of the goal?
3. **Ask clarifying questions** — If ANY ambiguity exists, use `ask_user` to ask targeted questions. Write thoughtful questions — not generic boilerplate:
```
ask_user {
question: "I have a few questions before I build the research plan:",
mode: "questions",
options: [
{ label: "1. What metric should define success? (e.g. test coverage %, build time ms, bundle size KB)" },
{ label: "2. Which files/directories are in scope for modification?" },
{ label: "3. Are there any approaches to avoid or constraints I should know about?" },
{ label: "4. What does 'done' look like — a specific target, or iterate until interrupted?" }
]
}
```
**Tailor the questions to the specific goal.** Don't ask about metrics if the user already specified one. Don't ask about scope if it's obvious. Ask about what's genuinely unclear.
4. **Skip if crystal clear** — If the goal description is unambiguous (clear metric, clear scope, clear exit criteria), you may skip questions and proceed directly to Step 2: Plan. State briefly why no questions are needed.
5. **Synthesize understanding** — After answers (or if skipped), form a concrete goal statement:
- **Goal:** One sentence
- **Metric:** What to measure, direction (higher/lower is better), verification command
- **Scope:** Files in/out of scope
- **Constraints:** Iteration budget, approaches to avoid, time limits
- **Exit criteria:** When to stop
6. **Save research session** — Create the session file to track this research lifecycle:
```
Write .context/research-sessions/<session-id>.json with:
{
id: "<timestamp-slug>",
status: "understanding",
goal: "<synthesized goal>",
metric: { name: "<metric>", direction: "<higher|lower>", verifyCommand: "<cmd>" },
scope: { inScope: [...], readOnly: [...], outOfScope: [...] },
clarifyingQA: [{ question: "...", answer: "..." }, ...],
plan: "",
iterations: [],
findings: "",
nextSteps: [],
implementation: {},
createdAt: "<now>",
updatedAt: "<now>",
workingDirectory: "<cwd>",
tags: []
}
```
Store the `session_id` — you'll update this file throughout the research lifecycle.
## Step 2: Plan (Present Before Executing)
Now that you understand the goal, write and present a research plan for user approval. Do NOT start iterating without approval.
1. **Establish baseline** — Run the verification command on the current state to get a starting metric value.
2. **Write the research plan** — Create `.context/autoresearch-plan.md` with this structure:
```markdown
# Autoresearch Plan: <goal summary>
## Goal
<Concrete goal statement from Step 1>
## Metric
- **Measuring:** <what>
- **Direction:** <higher/lower is better>
- **Verify command:** `<command>`
- **Baseline:** <current value>
- **Target:** <target value, if any, or "continuous improvement">
## Scope
- **In scope:** <files/directories that can be modified>
- **Read only:** <files for context but not modification>
- **Out of scope:** <explicitly excluded areas>
## Strategy
Ordered list of approaches to try, from most to least promising:
1. <First approach — why it's promising>
2. <Second approach — what it explores>
3. <Third approach — alternative angle>
4. <Fourth approach — radical idea>
5. <Fifth approach — simplification play>
## Iteration Plan
- **Mode:** <bounded (N iterations) / unbounded>
- **Estimated time per iteration:** <seconds/minutes>
- **When stuck protocol:** Re-read plan, combine near-misses, try opposites
## Exit Criteria
- <When to stop: metric target, iteration count, or manual interrupt>
```
3. **Present for approval** — Show the plan to the user:
```
show_plan { file_path: ".context/autoresearch-plan.md", title: "Autoresearch Plan: <goal>" }
```
- If **approved** → proceed to Step 3
- If **declined** → revise based on feedback and re-present
4. **Update session** — After plan approval, update the session file:
- Set `status` to `"planning"`
- Set `plan` to the full markdown content of the research plan
- Set `metric.baseline` to the baseline value established in step 1
## Step 3: Setup & Begin
With understanding confirmed and plan approved, set up the tracking infrastructure and start.
1. **Create results log** — Create `autoresearch-results.tsv` in the working directory:
```
# metric_direction: higher_is_better
iteration commit metric delta status description
```
2. **Record baseline** — Log the baseline metric from Step 2 as iteration #0
3. **Commander tracking** — If Commander is available, create a task group and send initial status:
```
commander_task { operation: "group:create", group_name: "Autoresearch: <goal>", initiative_summary: "<goal with metric and scope>", total_waves: 1, working_directory: "<cwd>", tasks: [] }
```
Store the returned `group_id`. Then broadcast:
```
commander_mailbox { operation: "send", from_agent: "autoresearch", to_agent: "commander", body: "Autoresearch started: <goal>. Baseline: <value>. Scope: <files>. Plan approved.", message_type: "status" }
```
4. **Update session** — Set session `status` to `"researching"`.
5. **Begin the loop** — Start iterating immediately. No further confirmation needed.
## Step 4: The Loop
Parse the arguments for `--iterations N`. If provided, loop exactly N times. Otherwise, loop until interrupted.
```
LOOP:
1. REVIEW: Read current state of in-scope files + last 10-20 results log entries + git log --oneline -20
2. IDEATE: Pick next change. Priority:
a. Fix crashes from previous iteration
b. Exploit successes — variants of what worked
c. Explore untried approaches
d. Combine near-misses
e. Simplify — remove code while maintaining metric
f. Radical experiments when stuck
2b. TRACK: Create + claim a Commander task for this iteration:
commander_task { operation: "create", description: "Iteration #N: <planned change>", working_directory: "<cwd>", group_id: <group_id> }
commander_task { operation: "claim", task_id: <task_id>, agent_name: "autoresearch" }
3. MODIFY: Make ONE focused, atomic change. Describable in one sentence.
4. COMMIT: git add + git commit -m "experiment: <description>" BEFORE verification
5. VERIFY: Run the mechanical metric. Capture output. Extract metric value.
6. DECIDE:
- IMPROVED -> Keep commit, log "keep"
- SAME/WORSE -> git reset --hard HEAD~1, log "discard"
- CRASHED -> Try to fix (max 3 attempts), else git reset --hard HEAD~1, log "crash"
7. LOG: Append result to autoresearch-results.tsv
7b. SESSION: On every "keep" or every ~5 iterations, update the session file:
- Append to `iterations` array: { iteration, commit, metric, delta, status, description }
- Update `metric.final` with the current best metric value
7c. COMPLETE: Complete the Commander task with results:
commander_task { operation: "complete", task_id: <task_id>, result: "<keep|discard|crash>: <description>. Metric: <value> (delta: <delta>)" }
commander_task { operation: "comment:add", task_id: <task_id>, body: "Status: <status>\nMetric: <value> (delta: <delta>)\nCommit: <hash or '-'>", agent_name: "autoresearch" }
8. REPEAT: Go to step 1
Every ~5 iterations, send a mailbox status update:
commander_mailbox { operation: "send", from_agent: "autoresearch", to_agent: "commander", body: "Iteration #N: metric at <value>. Keeps: X | Discards: Y | Crashes: Z", message_type: "status" }
```
## Critical Rules
1. **NEVER STOP. NEVER ASK "should I continue?"** — Loop until interrupted or iteration count reached
2. **Read before write** — Always re-read files. After rollbacks, state may differ from expectations
3. **One change per iteration** — Atomic changes. If it breaks, you know exactly why
4. **Mechanical verification only** — No subjective judgments. Use metrics with numbers
5. **Automatic rollback** — Failed changes revert instantly via git reset. No debates
6. **Simplicity wins** — Equal results + less code = KEEP. Tiny improvement + ugly complexity = DISCARD
7. **Git is memory** — Every kept change is committed. Read your own git history to learn patterns
8. **When stuck (>5 consecutive discards):**
- Re-read ALL in-scope files from scratch
- Re-read the original goal AND `.context/autoresearch-plan.md` for planned strategy
- Review entire results log for patterns
- Try the next untried approach from your plan's Strategy section
- Try combining 2-3 previously successful changes
- Try the OPPOSITE of what hasn't been working
- Try a radical architectural change
## Communication Protocol
- DO NOT ask "should I keep going?" — YES. ALWAYS. (unless bounded)
- DO NOT summarize after each iteration — just log and continue
- DO print a brief one-line status every ~5 iterations
- DO alert if you discover something surprising
- DO print a final summary when bounded iterations complete:
```
=== Autoresearch Complete (N/N iterations) ===
Baseline: {baseline} -> Final: {current} ({delta})
Keeps: X | Discards: Y | Crashes: Z
Best iteration: #{n} — {description}
```
- DO send a final Commander mailbox broadcast when the loop ends:
```
commander_mailbox { operation: "send", from_agent: "autoresearch", to_agent: "commander", body: "Autoresearch complete (N iterations). Baseline: X → Final: Y (delta: Z). Keeps: A | Discards: B | Crashes: C", message_type: "result" }
```
- DO proceed to **Step 5** (Research Report & Implementation Handoff) when the loop ends
## Step 5: Research Report & Implementation Handoff
When the loop ends (bounded iterations reached, goal achieved, or interrupted):
1. **Compile findings** — Summarize what worked, what didn't, and extract prioritized next steps:
- List of actionable implementation items, ranked by impact
- Each next step should be a concrete task a developer/agent could execute
- Include "Recommended Implementation Approach" — how to best implement the findings
2. **Update session** — Write findings and next steps to the session file:
- Set `findings` to the markdown findings summary
- Set `nextSteps` array with prioritized action items (each with priority number, description, status "pending")
- Update `metric.final` with the final metric value
- Keep `status` as `"researching"` (not yet implementing)
3. **Present the research report** — Call `show_report` framed as a handoff to implementation:
```
show_report {
title: "Research Complete — Ready for Implementation: <goal>",
summary: "## Research Results\n\nBaseline: X → Final: Y (delta: Z)\n\n**Iterations:** N total (A keeps, B discards, C crashes)\n\n**Best:** #M — <description>\n\n## Prioritized Next Steps\n\n1. <highest priority action item>\n2. <second priority>\n3. <third priority>\n...\n\n## Plan vs. Reality\n\n<strategies tried, outcomes, surprises>\n\n## Recommended Implementation Approach\n\n<how to implement these findings>"
}
```
4. **Ask user about implementation** — After the report closes:
```
ask_user {
question: "Research complete. What would you like to do next?",
mode: "select",
options: [
{ label: "Implement now — spawn a team to execute the findings" },
{ label: "Save & pause — resume implementation later via /research" },
{ label: "Done — research only, no implementation needed" }
]
}
```
5. **Handle the choice:**
- **"Implement now"** → proceed to Step 6
- **"Save & pause"** → set session `status` to `"paused"`, print the session ID for later resume
- **"Done"** → set session `status` to `"complete"`, done
## Step 6: Implementation (Spawn Team)
If the user chooses to implement:
1. **Update session** — Set `status` to `"implementing"`, set `implementation.startedAt` to now
2. **Create implementation tasks** — Convert the prioritized next steps into a Commander task group:
```
commander_task {
operation: "group:create",
group_name: "Implement: <goal>",
initiative_summary: "Implement findings from autoresearch: <goal>",
total_waves: 1,
working_directory: "<cwd>",
tasks: [
{ description: "<next step 1>", task_prompt: "<detailed implementation instructions>" },
{ description: "<next step 2>", task_prompt: "<detailed implementation instructions>" },
...
]
}
```
3. **Dispatch implementation agents** — Use `subagent_create_batch` to spawn builder agents:
```
subagent_create_batch {
groupName: "Implement: <goal>",
agents: [
{ name: "builder", task: "<detailed task for next step 1>", summary: "<brief>" },
{ name: "builder", task: "<detailed task for next step 2>", summary: "<brief>" },
...
]
}
```
Each agent gets the research context: what was tried, what worked, and specific implementation instructions.
4. **Track progress** — Monitor agent completion via Commander. Update session `nextSteps` status as agents finish.
5. **Final completion report** — When all implementation is done:
- Update session: `status` to `"complete"`, `implementation.completedAt` to now, `implementation.summary` with results
- Present the FINAL comprehensive report:
```
show_report {
title: "Research & Implementation Complete: <goal>",
summary: "## Original Goal\n\n<goal>\n\n## Research Results\n\nBaseline: X → Final: Y. N iterations (A keeps, B discards).\n\n## Implementation Summary\n\n<what was built, tasks completed>\n\n## Remaining Gaps\n\n<any items not yet addressed>"
}
```
## Domain Adaptation
| Domain | Metric | Verify Command |
|--------|--------|----------------|
| Backend code | Tests pass + coverage % | `npm test` |
| Frontend UI | Lighthouse score | `npx lighthouse` |
| Performance | Benchmark time (ms) | `npm run bench` |
| Refactoring | Tests pass + LOC reduced | `npm test && wc -l` |
| Content | Word count + readability | Custom script |
## Anti-Patterns (AVOID)
- Repeating an exact change that was already discarded
- Making multiple unrelated changes at once
- Chasing marginal gains with ugly complexity
- Subjective "looks good" instead of metrics
- Asking for permission to continue iterating
## Commander Tracking
All Commander integration is **optional** — if Commander is unavailable, skip these calls silently and never let a Commander error interrupt the loop.
### Lifecycle Summary
| When | What | Tool Call |
|------|------|-----------|
| Understand (Step 1) | Ask clarifying questions | `ask_user { mode: "questions", ... }` |
| Understand (Step 1) | Save initial session | `Write .context/research-sessions/<id>.json` |
| Plan (Step 2) | Present research plan | `show_plan { file_path: ".context/autoresearch-plan.md", ... }` |
| Plan (Step 2) | Update session with plan | `Write (update session file)` |
| Setup (Step 3) | Create task group | `commander_task { operation: "group:create", ... }` |
| Setup (Step 3) | Announce start | `commander_mailbox { operation: "send", message_type: "status", ... }` |
| Each iteration (before modify) | Create + claim task | `commander_task { operation: "create", ... }` then `{ operation: "claim", ... }` |
| Each iteration (after log) | Complete task | `commander_task { operation: "complete", ... }` |
| Each iteration (after log) | Save to session | `Write (append iteration to session)` |
| Every ~5 iterations | Progress broadcast | `commander_mailbox { operation: "send", message_type: "status", ... }` |
| Research complete (Step 5) | Save findings & next steps | `Write (update session with findings)` |
| Research complete (Step 5) | Research report | `show_report { title: "Research Complete — ...", ... }` |
| Research complete (Step 5) | Ask about implementation | `ask_user { mode: "select", ... }` |
| Implementation (Step 6) | Spawn team | `subagent_create_batch { ... }` |
| Implementation done (Step 6) | Final report | `show_report { title: "Research & Implementation Complete", ... }` |
### Task Completion Semantics
- **keep** → `complete` with result describing the improvement
- **discard** → `complete` with result noting the discard (this is expected, not a failure)
- **crash** → `complete` with result noting the crash and recovery
- **Only use `fail`** if the entire autoresearch loop must abort due to an unrecoverable error
## Session Persistence
Every autoresearch session is saved to `.context/research-sessions/<session-id>.json`. This enables:
- **Resume later** — pick up where you left off via `/research` command
- **Browse history** — see all past research sessions in the research browser
- **Track lifecycle** — from understanding through implementation completion
The session file is a JSON document tracking: goal, metric, scope, Q&A, plan, iterations, findings, next steps, and implementation status. Update it at every major lifecycle transition (understand → plan → research → implement → complete).
**BEGIN NOW. Start with Step 1: Understand the goal, ask clarifying questions if needed, then present a plan for approval. After the plan is approved, set up tracking, start the autonomous loop, and when research completes, present findings and offer implementation.**