Initial: pi-skill — 68 skills, 43 extensions, 11 themes for Pi

2026-05-25 16:38:02 +07:00
commit 69f7d8bdda
1689 changed files with 342427 additions and 0 deletions
--- a/commands/autoresearch/autoresearch.md
+++ b/commands/autoresearch/autoresearch.md
@@ -0,0 +1,373 @@
+---
+name: autoresearch
+description: "Autonomous Goal-directed Iteration — Apply Karpathy's autoresearch principles to ANY task. Loops autonomously: modify, verify, keep/discard, repeat."
+argument-hint: "<goal description> [--iterations N]"
+allowed-tools: ["Bash", "Read", "Write", "Edit", "ask_user", "show_plan", "show_research", "subagent_create_batch", "dispatch_agent", "commander_task", "commander_mailbox", "show_report"]
+---
+
+# Autoresearch — Autonomous Goal-directed Iteration
+
+You are now an **autonomous iteration agent**. Your job is to loop: Modify -> Verify -> Keep/Discard -> Repeat.
+
+Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work.
+
+## Your Task
+
+The user has given you a goal: **$ARGUMENTS**
+
+## Step 1: Understand (Do This First — Before ANY Work)
+
+Before you touch a single file, you must deeply understand the goal. Do NOT rush into iteration.
+
+1. **Read relevant files** — Scan the codebase to build context around the user's goal. Understand what exists, what patterns are in use, and what's realistic to change.
+
+2. **Identify ambiguities** — Based on the goal description and codebase context, identify what's unclear:
+   - Is the success metric obvious or ambiguous?
+   - Is the scope (which files to modify) clear?
+   - Are there constraints the user hasn't mentioned?
+   - Are there multiple valid interpretations of the goal?
+
+3. **Ask clarifying questions** — If ANY ambiguity exists, use `ask_user` to ask targeted questions. Write thoughtful questions — not generic boilerplate:
+   ```
+   ask_user {
+     question: "I have a few questions before I build the research plan:",
+     mode: "questions",
+     options: [
+       { label: "1. What metric should define success? (e.g. test coverage %, build time ms, bundle size KB)" },
+       { label: "2. Which files/directories are in scope for modification?" },
+       { label: "3. Are there any approaches to avoid or constraints I should know about?" },
+       { label: "4. What does 'done' look like — a specific target, or iterate until interrupted?" }
+     ]
+   }
+   ```
+   **Tailor the questions to the specific goal.** Don't ask about metrics if the user already specified one. Don't ask about scope if it's obvious. Ask about what's genuinely unclear.
+
+4. **Skip if crystal clear** — If the goal description is unambiguous (clear metric, clear scope, clear exit criteria), you may skip questions and proceed directly to Step 2: Plan. State briefly why no questions are needed.
+
+5. **Synthesize understanding** — After answers (or if skipped), form a concrete goal statement:
+   - **Goal:** One sentence
+   - **Metric:** What to measure, direction (higher/lower is better), verification command
+   - **Scope:** Files in/out of scope
+   - **Constraints:** Iteration budget, approaches to avoid, time limits
+   - **Exit criteria:** When to stop
+
+6. **Save research session** — Create the session file to track this research lifecycle:
+   ```
+   Write .context/research-sessions/<session-id>.json with:
+   {
+     id: "<timestamp-slug>",
+     status: "understanding",
+     goal: "<synthesized goal>",
+     metric: { name: "<metric>", direction: "<higher|lower>", verifyCommand: "<cmd>" },
+     scope: { inScope: [...], readOnly: [...], outOfScope: [...] },
+     clarifyingQA: [{ question: "...", answer: "..." }, ...],
+     plan: "",
+     iterations: [],
+     findings: "",
+     nextSteps: [],
+     implementation: {},
+     createdAt: "<now>",
+     updatedAt: "<now>",
+     workingDirectory: "<cwd>",
+     tags: []
+   }
+   ```
+   Store the `session_id` — you'll update this file throughout the research lifecycle.
+
+## Step 2: Plan (Present Before Executing)
+
+Now that you understand the goal, write and present a research plan for user approval. Do NOT start iterating without approval.
+
+1. **Establish baseline** — Run the verification command on the current state to get a starting metric value.
+
+2. **Write the research plan** — Create `.context/autoresearch-plan.md` with this structure:
+
+   ```markdown
+   # Autoresearch Plan: <goal summary>
+
+   ## Goal
+   <Concrete goal statement from Step 1>
+
+   ## Metric
+   - **Measuring:** <what>
+   - **Direction:** <higher/lower is better>
+   - **Verify command:** `<command>`
+   - **Baseline:** <current value>
+   - **Target:** <target value, if any, or "continuous improvement">
+
+   ## Scope
+   - **In scope:** <files/directories that can be modified>
+   - **Read only:** <files for context but not modification>
+   - **Out of scope:** <explicitly excluded areas>
+
+   ## Strategy
+   Ordered list of approaches to try, from most to least promising:
+
+   1. <First approach — why it's promising>
+   2. <Second approach — what it explores>
+   3. <Third approach — alternative angle>
+   4. <Fourth approach — radical idea>
+   5. <Fifth approach — simplification play>
+
+   ## Iteration Plan
+   - **Mode:** <bounded (N iterations) / unbounded>
+   - **Estimated time per iteration:** <seconds/minutes>
+   - **When stuck protocol:** Re-read plan, combine near-misses, try opposites
+
+   ## Exit Criteria
+   - <When to stop: metric target, iteration count, or manual interrupt>
+   ```
+
+3. **Present for approval** — Show the plan to the user:
+   ```
+   show_plan { file_path: ".context/autoresearch-plan.md", title: "Autoresearch Plan: <goal>" }
+   ```
+   - If **approved** → proceed to Step 3
+   - If **declined** → revise based on feedback and re-present
+
+4. **Update session** — After plan approval, update the session file:
+   - Set `status` to `"planning"`
+   - Set `plan` to the full markdown content of the research plan
+   - Set `metric.baseline` to the baseline value established in step 1
+
+## Step 3: Setup & Begin
+
+With understanding confirmed and plan approved, set up the tracking infrastructure and start.
+
+1. **Create results log** — Create `autoresearch-results.tsv` in the working directory:
+   ```
+   # metric_direction: higher_is_better
+   iteration	commit	metric	delta	status	description
+   ```
+2. **Record baseline** — Log the baseline metric from Step 2 as iteration #0
+3. **Commander tracking** — If Commander is available, create a task group and send initial status:
+   ```
+   commander_task { operation: "group:create", group_name: "Autoresearch: <goal>", initiative_summary: "<goal with metric and scope>", total_waves: 1, working_directory: "<cwd>", tasks: [] }
+   ```
+   Store the returned `group_id`. Then broadcast:
+   ```
+   commander_mailbox { operation: "send", from_agent: "autoresearch", to_agent: "commander", body: "Autoresearch started: <goal>. Baseline: <value>. Scope: <files>. Plan approved.", message_type: "status" }
+   ```
+4. **Update session** — Set session `status` to `"researching"`.
+5. **Begin the loop** — Start iterating immediately. No further confirmation needed.
+
+## Step 4: The Loop
+
+Parse the arguments for `--iterations N`. If provided, loop exactly N times. Otherwise, loop until interrupted.
+
+```
+LOOP:
+  1. REVIEW: Read current state of in-scope files + last 10-20 results log entries + git log --oneline -20
+  2. IDEATE: Pick next change. Priority:
+     a. Fix crashes from previous iteration
+     b. Exploit successes — variants of what worked
+     c. Explore untried approaches
+     d. Combine near-misses
+     e. Simplify — remove code while maintaining metric
+     f. Radical experiments when stuck
+  2b. TRACK: Create + claim a Commander task for this iteration:
+     commander_task { operation: "create", description: "Iteration #N: <planned change>", working_directory: "<cwd>", group_id: <group_id> }
+     commander_task { operation: "claim", task_id: <task_id>, agent_name: "autoresearch" }
+  3. MODIFY: Make ONE focused, atomic change. Describable in one sentence.
+  4. COMMIT: git add + git commit -m "experiment: <description>" BEFORE verification
+  5. VERIFY: Run the mechanical metric. Capture output. Extract metric value.
+  6. DECIDE:
+     - IMPROVED -> Keep commit, log "keep"
+     - SAME/WORSE -> git reset --hard HEAD~1, log "discard"
+     - CRASHED -> Try to fix (max 3 attempts), else git reset --hard HEAD~1, log "crash"
+  7. LOG: Append result to autoresearch-results.tsv
+  7b. SESSION: On every "keep" or every ~5 iterations, update the session file:
+     - Append to `iterations` array: { iteration, commit, metric, delta, status, description }
+     - Update `metric.final` with the current best metric value
+  7c. COMPLETE: Complete the Commander task with results:
+     commander_task { operation: "complete", task_id: <task_id>, result: "<keep|discard|crash>: <description>. Metric: <value> (delta: <delta>)" }
+     commander_task { operation: "comment:add", task_id: <task_id>, body: "Status: <status>\nMetric: <value> (delta: <delta>)\nCommit: <hash or '-'>", agent_name: "autoresearch" }
+  8. REPEAT: Go to step 1
+     Every ~5 iterations, send a mailbox status update:
+     commander_mailbox { operation: "send", from_agent: "autoresearch", to_agent: "commander", body: "Iteration #N: metric at <value>. Keeps: X | Discards: Y | Crashes: Z", message_type: "status" }
+```
+
+## Critical Rules
+
+1. **NEVER STOP. NEVER ASK "should I continue?"** — Loop until interrupted or iteration count reached
+2. **Read before write** — Always re-read files. After rollbacks, state may differ from expectations
+3. **One change per iteration** — Atomic changes. If it breaks, you know exactly why
+4. **Mechanical verification only** — No subjective judgments. Use metrics with numbers
+5. **Automatic rollback** — Failed changes revert instantly via git reset. No debates
+6. **Simplicity wins** — Equal results + less code = KEEP. Tiny improvement + ugly complexity = DISCARD
+7. **Git is memory** — Every kept change is committed. Read your own git history to learn patterns
+8. **When stuck (>5 consecutive discards):**
+   - Re-read ALL in-scope files from scratch
+   - Re-read the original goal AND `.context/autoresearch-plan.md` for planned strategy
+   - Review entire results log for patterns
+   - Try the next untried approach from your plan's Strategy section
+   - Try combining 2-3 previously successful changes
+   - Try the OPPOSITE of what hasn't been working
+   - Try a radical architectural change
+
+## Communication Protocol
+
+- DO NOT ask "should I keep going?" — YES. ALWAYS. (unless bounded)
+- DO NOT summarize after each iteration — just log and continue
+- DO print a brief one-line status every ~5 iterations
+- DO alert if you discover something surprising
+- DO print a final summary when bounded iterations complete:
+  ```
+  === Autoresearch Complete (N/N iterations) ===
+  Baseline: {baseline} -> Final: {current} ({delta})
+  Keeps: X | Discards: Y | Crashes: Z
+  Best iteration: #{n} — {description}
+  ```
+- DO send a final Commander mailbox broadcast when the loop ends:
+  ```
+  commander_mailbox { operation: "send", from_agent: "autoresearch", to_agent: "commander", body: "Autoresearch complete (N iterations). Baseline: X → Final: Y (delta: Z). Keeps: A | Discards: B | Crashes: C", message_type: "result" }
+  ```
+- DO proceed to **Step 5** (Research Report & Implementation Handoff) when the loop ends
+
+## Step 5: Research Report & Implementation Handoff
+
+When the loop ends (bounded iterations reached, goal achieved, or interrupted):
+
+1. **Compile findings** — Summarize what worked, what didn't, and extract prioritized next steps:
+   - List of actionable implementation items, ranked by impact
+   - Each next step should be a concrete task a developer/agent could execute
+   - Include "Recommended Implementation Approach" — how to best implement the findings
+
+2. **Update session** — Write findings and next steps to the session file:
+   - Set `findings` to the markdown findings summary
+   - Set `nextSteps` array with prioritized action items (each with priority number, description, status "pending")
+   - Update `metric.final` with the final metric value
+   - Keep `status` as `"researching"` (not yet implementing)
+
+3. **Present the research report** — Call `show_report` framed as a handoff to implementation:
+   ```
+   show_report {
+     title: "Research Complete — Ready for Implementation: <goal>",
+     summary: "## Research Results\n\nBaseline: X → Final: Y (delta: Z)\n\n**Iterations:** N total (A keeps, B discards, C crashes)\n\n**Best:** #M — <description>\n\n## Prioritized Next Steps\n\n1. <highest priority action item>\n2. <second priority>\n3. <third priority>\n...\n\n## Plan vs. Reality\n\n<strategies tried, outcomes, surprises>\n\n## Recommended Implementation Approach\n\n<how to implement these findings>"
+   }
+   ```
+
+4. **Ask user about implementation** — After the report closes:
+   ```
+   ask_user {
+     question: "Research complete. What would you like to do next?",
+     mode: "select",
+     options: [
+       { label: "Implement now — spawn a team to execute the findings" },
+       { label: "Save & pause — resume implementation later via /research" },
+       { label: "Done — research only, no implementation needed" }
+     ]
+   }
+   ```
+
+5. **Handle the choice:**
+   - **"Implement now"** → proceed to Step 6
+   - **"Save & pause"** → set session `status` to `"paused"`, print the session ID for later resume
+   - **"Done"** → set session `status` to `"complete"`, done
+
+## Step 6: Implementation (Spawn Team)
+
+If the user chooses to implement:
+
+1. **Update session** — Set `status` to `"implementing"`, set `implementation.startedAt` to now
+
+2. **Create implementation tasks** — Convert the prioritized next steps into a Commander task group:
+   ```
+   commander_task {
+     operation: "group:create",
+     group_name: "Implement: <goal>",
+     initiative_summary: "Implement findings from autoresearch: <goal>",
+     total_waves: 1,
+     working_directory: "<cwd>",
+     tasks: [
+       { description: "<next step 1>", task_prompt: "<detailed implementation instructions>" },
+       { description: "<next step 2>", task_prompt: "<detailed implementation instructions>" },
+       ...
+     ]
+   }
+   ```
+
+3. **Dispatch implementation agents** — Use `subagent_create_batch` to spawn builder agents:
+   ```
+   subagent_create_batch {
+     groupName: "Implement: <goal>",
+     agents: [
+       { name: "builder", task: "<detailed task for next step 1>", summary: "<brief>" },
+       { name: "builder", task: "<detailed task for next step 2>", summary: "<brief>" },
+       ...
+     ]
+   }
+   ```
+   Each agent gets the research context: what was tried, what worked, and specific implementation instructions.
+
+4. **Track progress** — Monitor agent completion via Commander. Update session `nextSteps` status as agents finish.
+
+5. **Final completion report** — When all implementation is done:
+   - Update session: `status` to `"complete"`, `implementation.completedAt` to now, `implementation.summary` with results
+   - Present the FINAL comprehensive report:
+   ```
+   show_report {
+     title: "Research & Implementation Complete: <goal>",
+     summary: "## Original Goal\n\n<goal>\n\n## Research Results\n\nBaseline: X → Final: Y. N iterations (A keeps, B discards).\n\n## Implementation Summary\n\n<what was built, tasks completed>\n\n## Remaining Gaps\n\n<any items not yet addressed>"
+   }
+   ```
+
+## Domain Adaptation
+
+| Domain | Metric | Verify Command |
+|--------|--------|----------------|
+| Backend code | Tests pass + coverage % | `npm test` |
+| Frontend UI | Lighthouse score | `npx lighthouse` |
+| Performance | Benchmark time (ms) | `npm run bench` |
+| Refactoring | Tests pass + LOC reduced | `npm test && wc -l` |
+| Content | Word count + readability | Custom script |
+
+## Anti-Patterns (AVOID)
+
+- Repeating an exact change that was already discarded
+- Making multiple unrelated changes at once
+- Chasing marginal gains with ugly complexity
+- Subjective "looks good" instead of metrics
+- Asking for permission to continue iterating
+
+## Commander Tracking
+
+All Commander integration is **optional** — if Commander is unavailable, skip these calls silently and never let a Commander error interrupt the loop.
+
+### Lifecycle Summary
+
+| When | What | Tool Call |
+|------|------|-----------|
+| Understand (Step 1) | Ask clarifying questions | `ask_user { mode: "questions", ... }` |
+| Understand (Step 1) | Save initial session | `Write .context/research-sessions/<id>.json` |
+| Plan (Step 2) | Present research plan | `show_plan { file_path: ".context/autoresearch-plan.md", ... }` |
+| Plan (Step 2) | Update session with plan | `Write (update session file)` |
+| Setup (Step 3) | Create task group | `commander_task { operation: "group:create", ... }` |
+| Setup (Step 3) | Announce start | `commander_mailbox { operation: "send", message_type: "status", ... }` |
+| Each iteration (before modify) | Create + claim task | `commander_task { operation: "create", ... }` then `{ operation: "claim", ... }` |
+| Each iteration (after log) | Complete task | `commander_task { operation: "complete", ... }` |
+| Each iteration (after log) | Save to session | `Write (append iteration to session)` |
+| Every ~5 iterations | Progress broadcast | `commander_mailbox { operation: "send", message_type: "status", ... }` |
+| Research complete (Step 5) | Save findings & next steps | `Write (update session with findings)` |
+| Research complete (Step 5) | Research report | `show_report { title: "Research Complete — ...", ... }` |
+| Research complete (Step 5) | Ask about implementation | `ask_user { mode: "select", ... }` |
+| Implementation (Step 6) | Spawn team | `subagent_create_batch { ... }` |
+| Implementation done (Step 6) | Final report | `show_report { title: "Research & Implementation Complete", ... }` |
+
+### Task Completion Semantics
+
+- **keep** → `complete` with result describing the improvement
+- **discard** → `complete` with result noting the discard (this is expected, not a failure)
+- **crash** → `complete` with result noting the crash and recovery
+- **Only use `fail`** if the entire autoresearch loop must abort due to an unrecoverable error
+
+## Session Persistence
+
+Every autoresearch session is saved to `.context/research-sessions/<session-id>.json`. This enables:
+- **Resume later** — pick up where you left off via `/research` command
+- **Browse history** — see all past research sessions in the research browser
+- **Track lifecycle** — from understanding through implementation completion
+
+The session file is a JSON document tracking: goal, metric, scope, Q&A, plan, iterations, findings, next steps, and implementation status. Update it at every major lifecycle transition (understand → plan → research → implement → complete).
+
+**BEGIN NOW. Start with Step 1: Understand the goal, ask clarifying questions if needed, then present a plan for approval. After the plan is approved, set up tracking, start the autonomous loop, and when research completes, present findings and offer implementation.**