Files
2026-05-25 16:41:08 +07:00

13 KiB

name, description, allowed-tools
name description allowed-tools
autoresearch Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Invoke with /skill:autoresearch or when user says "work autonomously", "iterate until done", "keep improving", or "run overnight". Bash(git:*) Bash(npm:*) Bash(npx:*) Read Write Edit ask_user show_plan show_research subagent_create_batch dispatch_agent commander_task commander_mailbox show_report

Autoresearch — Autonomous Goal-directed Iteration

Inspired by Karpathy's autoresearch. Applies constraint-driven autonomous iteration to ANY work — not just ML research.

Core idea: You are an autonomous agent. Modify -> Verify -> Keep/Discard -> Repeat.

When to Activate

  • User invokes /skill:autoresearch or /autoresearch
  • User says "work autonomously", "iterate until done", "keep improving", "run overnight"
  • Any task requiring repeated iteration cycles with measurable outcomes

Phase 1: Understand (Do This First — Before ANY Work)

Before touching any files, deeply understand the goal. Do NOT rush into iteration.

  1. Read relevant files — Scan the codebase to build context around the user's goal. Understand what exists, what patterns are in use, and what's realistic.

  2. Identify ambiguities — Based on the goal and codebase context, what's unclear?

    • Is the success metric obvious or ambiguous?
    • Is the scope (which files to modify) clear?
    • Are there constraints the user hasn't mentioned?
    • Are there multiple valid interpretations?
  3. Ask clarifying questions — If ANY ambiguity exists, use ask_user to ask targeted questions:

    ask_user {
      question: "I have a few questions before I build the research plan:",
      mode: "questions",
      options: [
        { label: "1. What metric should define success? (e.g. test coverage %, build time ms, bundle size KB)" },
        { label: "2. Which files/directories are in scope for modification?" },
        { label: "3. Are there any approaches to avoid or constraints I should know about?" },
        { label: "4. What does 'done' look like — a specific target, or iterate until interrupted?" }
      ]
    }
    

    Tailor questions to the specific goal. Don't ask about what's already clear. Ask about genuine ambiguities.

  4. Skip if crystal clear — If the goal is unambiguous (clear metric, scope, exit criteria), skip questions and proceed to Phase 2. State briefly why no questions are needed.

  5. Synthesize understanding — Form a concrete statement: Goal, Metric (what + direction + verify command), Scope (in/out), Constraints, Exit criteria.

  6. Save research session — Create .context/research-sessions/<session-id>.json with the initial session data: goal, metric, scope, clarifying Q&A, status "understanding". Store the session_id for updates throughout the lifecycle.

Phase 2: Plan (Present Before Executing)

Now write and present a research plan for user approval. Do NOT start iterating without approval.

  1. Establish baseline — Run the verification command to get a starting metric value.

  2. Write the research plan — Create .context/autoresearch-plan.md:

    # Autoresearch Plan: <goal summary>
    
    ## Goal
    <Concrete goal statement>
    
    ## Metric
    - **Measuring:** <what>
    - **Direction:** <higher/lower is better>
    - **Verify command:** `<command>`
    - **Baseline:** <current value>
    - **Target:** <target value or "continuous improvement">
    
    ## Scope
    - **In scope:** <files/directories to modify>
    - **Read only:** <files for context only>
    - **Out of scope:** <excluded areas>
    
    ## Strategy
    Ordered approaches, most to least promising:
    1. <First approach — why promising>
    2. <Second approach — what it explores>
    3. <Third approach — alternative angle>
    4. <Fourth approach — radical idea>
    5. <Fifth approach — simplification play>
    
    ## Iteration Plan
    - **Mode:** <bounded (N) / unbounded>
    - **Estimated time per iteration:** <seconds/minutes>
    - **When stuck:** Re-read plan, combine near-misses, try opposites
    
    ## Exit Criteria
    - <When to stop>
    
  3. Present for approval:

    show_plan { file_path: ".context/autoresearch-plan.md", title: "Autoresearch Plan: <goal>" }
    
    • Approved → proceed to Phase 3
    • Declined → revise based on feedback and re-present
  4. Update session — Set status to "planning", save plan content and baseline metric.

Phase 3: Setup & Begin

With understanding confirmed and plan approved, set up tracking and start.

  1. Create results log — Create autoresearch-results.tsv (see references/results-logging.md)
  2. Record baseline — Log the baseline metric from Phase 2 as iteration #0
  3. Commander tracking — If available, create task group and broadcast start (see Commander Integration below)
  4. Update session — Set status to "researching"
  5. Begin the loop — Start iterating immediately. No further confirmation needed.

The Loop

Read references/autonomous-loop-protocol.md for full protocol details.

LOOP (FOREVER or N times):
  1. Review: Read current state + git history + results log
  2. Ideate: Pick next change based on goal, past results, what hasn't been tried
  3. Modify: Make ONE focused change to in-scope files
  4. Commit: Git commit the change (before verification)
  5. Verify: Run the mechanical metric (tests, build, benchmark, etc.)
  6. Decide:
     - IMPROVED -> Keep commit, log "keep", advance
     - SAME/WORSE -> Git revert, log "discard"
     - CRASHED -> Try to fix (max 3 attempts), else log "crash" and move on
  7. Log: Record result in results log
  8. Repeat: Go to step 1.
     - If unbounded: NEVER STOP. NEVER ASK "should I continue?"
     - If bounded (N): Stop after N iterations, print final summary

Critical Rules

  1. Loop until done — Unbounded: loop until interrupted. Bounded: loop N times then summarize.
  2. Read before write — Always understand full context before modifying
  3. One change per iteration — Atomic changes. If it breaks, you know exactly why
  4. Mechanical verification only — No subjective "looks good". Use metrics
  5. Automatic rollback — Failed changes revert instantly. No debates
  6. Simplicity wins — Equal results + less code = KEEP. Tiny improvement + ugly complexity = DISCARD
  7. Git is memory — Every kept change committed. Agent reads history to learn patterns
  8. When stuck, think harder — Re-read files, re-read goal AND .context/autoresearch-plan.md for planned strategy, try next untried approach from the plan, combine near-misses, try radical changes. Don't ask for help unless truly blocked by missing access/permissions

Principles Reference

See references/core-principles.md for the 7 generalizable principles from autoresearch.

Commander Integration (Task Tracking & Visibility)

When Commander is available, autoresearch MUST track every iteration as a Commander task. This gives the dashboard full visibility into autonomous work — just like the tasks extension does for manual workflows.

Setup Phase (Phase 3) — Create Task Group

After establishing the baseline and getting plan approval, create a Commander task group for this research session:

commander_task {
  operation: "group:create",
  group_name: "Autoresearch: <goal summary>",
  initiative_summary: "<full goal description with metric and scope>",
  total_waves: 1,
  working_directory: "<cwd>",
  tasks: []
}

Store the returned group_id — all iteration tasks will be added to this group.

Send an initial mailbox status broadcast:

commander_mailbox {
  operation: "send",
  from_agent: "autoresearch",
  to_agent: "commander",
  body: "Autoresearch started: <goal>. Baseline metric: <value>. Scope: <files>. Plan approved.",
  message_type: "status"
}

Per-Iteration — Create → Claim → Complete

Before modifying (step 3 of each loop iteration), create and claim a Commander task:

commander_task { operation: "create", description: "Iteration #N: <planned change>", working_directory: "<cwd>", group_id: <group_id> }
commander_task { operation: "claim", task_id: <task_id>, agent_name: "autoresearch" }

After logging results (step 7 of each loop iteration), complete the task with the outcome:

commander_task { operation: "complete", task_id: <task_id>, result: "<status>: <description>. Metric: <old> → <new> (delta: <delta>)" }

Also add a comment to the task with detailed results:

commander_task { operation: "comment:add", task_id: <task_id>, body: "Status: <keep|discard|crash>\nMetric: <value> (delta: <delta>)\nCommit: <hash or '-'>\nDescription: <what was tried>", agent_name: "autoresearch" }

Note: Use complete for ALL outcomes (keep, discard, crash). Discards and crashes are expected in autoresearch — they're not failures. Reserve fail only for unrecoverable errors that halt the entire loop.

Status Broadcasts — Every ~5 Iterations

Every 5 iterations, send a mailbox status update AND add a comment to the group:

commander_mailbox {
  operation: "send",
  from_agent: "autoresearch",
  to_agent: "commander",
  body: "Autoresearch progress — Iteration #N: metric at <value> (baseline: <baseline>). Keeps: X | Discards: Y | Crashes: Z",
  message_type: "status"
}

Research Complete — Report & Implementation Handoff (MANDATORY)

When the loop ends (bounded mode reaching N, or goal achieved):

  1. Final mailbox broadcast with full summary:
commander_mailbox {
  operation: "send",
  from_agent: "autoresearch",
  to_agent: "commander",
  body: "Autoresearch complete (N iterations). Baseline: <X> → Final: <Y> (delta: <Z>). Keeps: A | Discards: B | Crashes: C. Best iteration: #M — <description>",
  message_type: "result"
}
  1. Compile findings & next steps — Extract prioritized, actionable implementation items from the research. Update the session file with findings, next steps array, and final metric.

  2. Research report — Present via show_report framed as a handoff:

show_report {
  title: "Research Complete — Ready for Implementation: <goal>",
  summary: "## Research Results\n\n...\n\n## Prioritized Next Steps\n\n1. <action item>\n2. ...\n\n## Recommended Implementation Approach\n\n<how to implement>"
}
  1. Ask about implementation — Use ask_user to offer three choices:

    • Implement now → spawn a team of builder agents via subagent_create_batch
    • Save & pause → set session to "paused", resume later via /research
    • Done → mark session "complete"
  2. Implementation (if chosen) — Update session to "implementing", create Commander task group, dispatch builders, track completion. When done, present final comprehensive report covering research results AND implementation work. Set session to "complete".

  3. Preserve the plan — Leave .context/autoresearch-plan.md intact. Leave the session file for browsing via /research.

Graceful Degradation

All Commander calls are optional. If Commander is unavailable:

  • Skip commander_task and commander_mailbox calls silently
  • The local autoresearch-results.tsv log remains the primary record
  • The show_report call still works (it only needs git, not Commander)
  • Never let a Commander error interrupt the autonomous loop

Adapting to Different Domains

Domain Metric Scope Verify Command
Backend code Tests pass + coverage % src/**/*.ts npm test
Frontend UI Lighthouse score src/components/** npx lighthouse
ML training val_bpb / loss train.py uv run train.py
Blog/content Word count + readability content/*.md Custom script
Performance Benchmark time (ms) Target files npm run bench
Refactoring Tests pass + LOC reduced Target module npm test && wc -l

Adapt the loop to your domain. The PRINCIPLES are universal; the METRICS are domain-specific.

Session Persistence

Every autoresearch session is saved to .context/research-sessions/<session-id>.json. This enables:

  • Resume later — pick up where you left off via /research command
  • Browse history — see all past research sessions in the research browser
  • Track lifecycle — from understanding through implementation completion

Update the session file at every major transition: understand → plan → research → implement → complete. On every "keep" iteration or every ~5 iterations, append iteration data to the session. This creates a complete record of the research lifecycle that can be browsed and resumed.