Initial: pi-skill — 68 skills, 43 extensions, 11 themes for Pi

This commit is contained in:
Kunthawat Greethong
2026-05-25 16:38:02 +07:00
commit 69f7d8bdda
1689 changed files with 342427 additions and 0 deletions

View File

@@ -0,0 +1,280 @@
---
name: autoresearch
description: Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Invoke with /skill:autoresearch or when user says "work autonomously", "iterate until done", "keep improving", or "run overnight".
allowed-tools: Bash(git:*) Bash(npm:*) Bash(npx:*) Read Write Edit ask_user show_plan show_research subagent_create_batch dispatch_agent commander_task commander_mailbox show_report
---
# Autoresearch — Autonomous Goal-directed Iteration
Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch). Applies constraint-driven autonomous iteration to ANY work — not just ML research.
**Core idea:** You are an autonomous agent. Modify -> Verify -> Keep/Discard -> Repeat.
## When to Activate
- User invokes `/skill:autoresearch` or `/autoresearch`
- User says "work autonomously", "iterate until done", "keep improving", "run overnight"
- Any task requiring repeated iteration cycles with measurable outcomes
## Phase 1: Understand (Do This First — Before ANY Work)
Before touching any files, deeply understand the goal. Do NOT rush into iteration.
1. **Read relevant files** — Scan the codebase to build context around the user's goal. Understand what exists, what patterns are in use, and what's realistic.
2. **Identify ambiguities** — Based on the goal and codebase context, what's unclear?
- Is the success metric obvious or ambiguous?
- Is the scope (which files to modify) clear?
- Are there constraints the user hasn't mentioned?
- Are there multiple valid interpretations?
3. **Ask clarifying questions** — If ANY ambiguity exists, use `ask_user` to ask targeted questions:
```
ask_user {
question: "I have a few questions before I build the research plan:",
mode: "questions",
options: [
{ label: "1. What metric should define success? (e.g. test coverage %, build time ms, bundle size KB)" },
{ label: "2. Which files/directories are in scope for modification?" },
{ label: "3. Are there any approaches to avoid or constraints I should know about?" },
{ label: "4. What does 'done' look like — a specific target, or iterate until interrupted?" }
]
}
```
**Tailor questions to the specific goal.** Don't ask about what's already clear. Ask about genuine ambiguities.
4. **Skip if crystal clear** — If the goal is unambiguous (clear metric, scope, exit criteria), skip questions and proceed to Phase 2. State briefly why no questions are needed.
5. **Synthesize understanding** — Form a concrete statement: Goal, Metric (what + direction + verify command), Scope (in/out), Constraints, Exit criteria.
6. **Save research session** — Create `.context/research-sessions/<session-id>.json` with the initial session data: goal, metric, scope, clarifying Q&A, status "understanding". Store the `session_id` for updates throughout the lifecycle.
## Phase 2: Plan (Present Before Executing)
Now write and present a research plan for user approval. Do NOT start iterating without approval.
1. **Establish baseline** — Run the verification command to get a starting metric value.
2. **Write the research plan** — Create `.context/autoresearch-plan.md`:
```markdown
# Autoresearch Plan: <goal summary>
## Goal
<Concrete goal statement>
## Metric
- **Measuring:** <what>
- **Direction:** <higher/lower is better>
- **Verify command:** `<command>`
- **Baseline:** <current value>
- **Target:** <target value or "continuous improvement">
## Scope
- **In scope:** <files/directories to modify>
- **Read only:** <files for context only>
- **Out of scope:** <excluded areas>
## Strategy
Ordered approaches, most to least promising:
1. <First approach — why promising>
2. <Second approach — what it explores>
3. <Third approach — alternative angle>
4. <Fourth approach — radical idea>
5. <Fifth approach — simplification play>
## Iteration Plan
- **Mode:** <bounded (N) / unbounded>
- **Estimated time per iteration:** <seconds/minutes>
- **When stuck:** Re-read plan, combine near-misses, try opposites
## Exit Criteria
- <When to stop>
```
3. **Present for approval:**
```
show_plan { file_path: ".context/autoresearch-plan.md", title: "Autoresearch Plan: <goal>" }
```
- **Approved** → proceed to Phase 3
- **Declined** → revise based on feedback and re-present
4. **Update session** — Set status to "planning", save plan content and baseline metric.
## Phase 3: Setup & Begin
With understanding confirmed and plan approved, set up tracking and start.
1. **Create results log** — Create `autoresearch-results.tsv` (see `references/results-logging.md`)
2. **Record baseline** — Log the baseline metric from Phase 2 as iteration #0
3. **Commander tracking** — If available, create task group and broadcast start (see Commander Integration below)
4. **Update session** — Set status to "researching"
5. **Begin the loop** — Start iterating immediately. No further confirmation needed.
## The Loop
Read `references/autonomous-loop-protocol.md` for full protocol details.
```
LOOP (FOREVER or N times):
1. Review: Read current state + git history + results log
2. Ideate: Pick next change based on goal, past results, what hasn't been tried
3. Modify: Make ONE focused change to in-scope files
4. Commit: Git commit the change (before verification)
5. Verify: Run the mechanical metric (tests, build, benchmark, etc.)
6. Decide:
- IMPROVED -> Keep commit, log "keep", advance
- SAME/WORSE -> Git revert, log "discard"
- CRASHED -> Try to fix (max 3 attempts), else log "crash" and move on
7. Log: Record result in results log
8. Repeat: Go to step 1.
- If unbounded: NEVER STOP. NEVER ASK "should I continue?"
- If bounded (N): Stop after N iterations, print final summary
```
## Critical Rules
1. **Loop until done** — Unbounded: loop until interrupted. Bounded: loop N times then summarize.
2. **Read before write** — Always understand full context before modifying
3. **One change per iteration** — Atomic changes. If it breaks, you know exactly why
4. **Mechanical verification only** — No subjective "looks good". Use metrics
5. **Automatic rollback** — Failed changes revert instantly. No debates
6. **Simplicity wins** — Equal results + less code = KEEP. Tiny improvement + ugly complexity = DISCARD
7. **Git is memory** — Every kept change committed. Agent reads history to learn patterns
8. **When stuck, think harder** — Re-read files, re-read goal AND `.context/autoresearch-plan.md` for planned strategy, try next untried approach from the plan, combine near-misses, try radical changes. Don't ask for help unless truly blocked by missing access/permissions
## Principles Reference
See `references/core-principles.md` for the 7 generalizable principles from autoresearch.
## Commander Integration (Task Tracking & Visibility)
When Commander is available, autoresearch MUST track every iteration as a Commander task. This gives the dashboard full visibility into autonomous work — just like the `tasks` extension does for manual workflows.
### Setup Phase (Phase 3) — Create Task Group
After establishing the baseline and getting plan approval, create a Commander task group for this research session:
```
commander_task {
operation: "group:create",
group_name: "Autoresearch: <goal summary>",
initiative_summary: "<full goal description with metric and scope>",
total_waves: 1,
working_directory: "<cwd>",
tasks: []
}
```
Store the returned `group_id` — all iteration tasks will be added to this group.
Send an initial mailbox status broadcast:
```
commander_mailbox {
operation: "send",
from_agent: "autoresearch",
to_agent: "commander",
body: "Autoresearch started: <goal>. Baseline metric: <value>. Scope: <files>. Plan approved.",
message_type: "status"
}
```
### Per-Iteration — Create → Claim → Complete
**Before modifying** (step 3 of each loop iteration), create and claim a Commander task:
```
commander_task { operation: "create", description: "Iteration #N: <planned change>", working_directory: "<cwd>", group_id: <group_id> }
commander_task { operation: "claim", task_id: <task_id>, agent_name: "autoresearch" }
```
**After logging results** (step 7 of each loop iteration), complete the task with the outcome:
```
commander_task { operation: "complete", task_id: <task_id>, result: "<status>: <description>. Metric: <old> → <new> (delta: <delta>)" }
```
Also add a comment to the task with detailed results:
```
commander_task { operation: "comment:add", task_id: <task_id>, body: "Status: <keep|discard|crash>\nMetric: <value> (delta: <delta>)\nCommit: <hash or '-'>\nDescription: <what was tried>", agent_name: "autoresearch" }
```
**Note:** Use `complete` for ALL outcomes (keep, discard, crash). Discards and crashes are expected in autoresearch — they're not failures. Reserve `fail` only for unrecoverable errors that halt the entire loop.
### Status Broadcasts — Every ~5 Iterations
Every 5 iterations, send a mailbox status update AND add a comment to the group:
```
commander_mailbox {
operation: "send",
from_agent: "autoresearch",
to_agent: "commander",
body: "Autoresearch progress — Iteration #N: metric at <value> (baseline: <baseline>). Keeps: X | Discards: Y | Crashes: Z",
message_type: "status"
}
```
### Research Complete — Report & Implementation Handoff (MANDATORY)
When the loop ends (bounded mode reaching N, or goal achieved):
1. **Final mailbox broadcast** with full summary:
```
commander_mailbox {
operation: "send",
from_agent: "autoresearch",
to_agent: "commander",
body: "Autoresearch complete (N iterations). Baseline: <X> → Final: <Y> (delta: <Z>). Keeps: A | Discards: B | Crashes: C. Best iteration: #M — <description>",
message_type: "result"
}
```
2. **Compile findings & next steps** — Extract prioritized, actionable implementation items from the research. Update the session file with findings, next steps array, and final metric.
3. **Research report** — Present via `show_report` framed as a handoff:
```
show_report {
title: "Research Complete — Ready for Implementation: <goal>",
summary: "## Research Results\n\n...\n\n## Prioritized Next Steps\n\n1. <action item>\n2. ...\n\n## Recommended Implementation Approach\n\n<how to implement>"
}
```
4. **Ask about implementation** — Use `ask_user` to offer three choices:
- **Implement now** → spawn a team of builder agents via `subagent_create_batch`
- **Save & pause** → set session to "paused", resume later via `/research`
- **Done** → mark session "complete"
5. **Implementation (if chosen)** — Update session to "implementing", create Commander task group, dispatch builders, track completion. When done, present final comprehensive report covering research results AND implementation work. Set session to "complete".
6. **Preserve the plan** — Leave `.context/autoresearch-plan.md` intact. Leave the session file for browsing via `/research`.
### Graceful Degradation
All Commander calls are **optional**. If Commander is unavailable:
- Skip `commander_task` and `commander_mailbox` calls silently
- The local `autoresearch-results.tsv` log remains the primary record
- The `show_report` call still works (it only needs git, not Commander)
- Never let a Commander error interrupt the autonomous loop
## Adapting to Different Domains
| Domain | Metric | Scope | Verify Command |
|--------|--------|-------|----------------|
| Backend code | Tests pass + coverage % | `src/**/*.ts` | `npm test` |
| Frontend UI | Lighthouse score | `src/components/**` | `npx lighthouse` |
| ML training | val_bpb / loss | `train.py` | `uv run train.py` |
| Blog/content | Word count + readability | `content/*.md` | Custom script |
| Performance | Benchmark time (ms) | Target files | `npm run bench` |
| Refactoring | Tests pass + LOC reduced | Target module | `npm test && wc -l` |
Adapt the loop to your domain. The PRINCIPLES are universal; the METRICS are domain-specific.
## Session Persistence
Every autoresearch session is saved to `.context/research-sessions/<session-id>.json`. This enables:
- **Resume later** — pick up where you left off via `/research` command
- **Browse history** — see all past research sessions in the research browser
- **Track lifecycle** — from understanding through implementation completion
Update the session file at every major transition: understand → plan → research → implement → complete. On every "keep" iteration or every ~5 iterations, append iteration data to the session. This creates a complete record of the research lifecycle that can be browsed and resumed.

View File

@@ -0,0 +1,246 @@
# Autonomous Loop Protocol
Detailed protocol for the autoresearch iteration loop. SKILL.md has the summary; this file has the full rules.
## Loop Modes
Autoresearch supports two loop modes:
- **Unbounded (default):** Loop forever until manually interrupted (Ctrl+C)
- **Bounded:** Loop exactly N times when user specifies a count
When bounded, track `current_iteration` against `max_iterations`. After the final iteration, print a summary and stop.
## Phase 1: Review (30 seconds)
Before each iteration, build situational awareness:
```
1. Read current state of in-scope files (full context)
2. Read last 10-20 entries from results log
3. Read git log --oneline -20 to see recent changes
4. Re-read `.context/autoresearch-plan.md` Strategy section for planned approaches
5. Identify: what worked, what failed, what's untried from the plan
6. If bounded: check current_iteration vs max_iterations
```
**Why read every time?** After rollbacks, state may differ from what you expect. Never assume — always verify.
## Phase 2: Ideate (Strategic)
Pick the NEXT change. Priority order:
1. **Fix crashes/failures** from previous iteration first
2. **Exploit successes** — if last change improved metric, try variants in same direction
3. **Explore new approaches** — try something the results log shows hasn't been attempted
4. **Combine near-misses** — two changes that individually didn't help might work together
5. **Simplify** — remove code while maintaining metric. Simpler = better
6. **Radical experiments** — when incremental changes stall, try something dramatically different
**Anti-patterns:**
- Don't repeat exact same change that was already discarded
- Don't make multiple unrelated changes at once (can't attribute improvement)
- Don't chase marginal gains with ugly complexity
**Bounded mode consideration:** If remaining iterations are limited (<3 left), prioritize exploiting successes over exploration.
### Commander: Create + Claim Task
After ideating the next change, create a Commander task and claim it:
```
commander_task { operation: "create", description: "Iteration #N: <planned change>", working_directory: "<cwd>", group_id: <group_id> }
commander_task { operation: "claim", task_id: <returned_task_id>, agent_name: "autoresearch" }
```
This makes the iteration visible on the Commander dashboard as an active task.
## Phase 3: Modify (One Atomic Change)
- Make ONE focused change to in-scope files
- The change should be explainable in one sentence
- Write the description BEFORE making the change (forces clarity)
## Phase 4: Commit (Before Verification)
```bash
git add <changed-files>
git commit -m "experiment: <one-sentence description>"
```
Commit BEFORE running verification so rollback is clean: `git reset --hard HEAD~1`
## Phase 5: Verify (Mechanical Only)
Run the agreed-upon verification command. Capture output.
**Timeout rule:** If verification exceeds 2x normal time, kill and treat as crash.
**Extract metric:** Parse the verification output for the specific metric number.
## Phase 6: Decide (No Ambiguity)
```
IF metric_improved:
STATUS = "keep"
# Do nothing — commit stays
ELIF metric_same_or_worse:
STATUS = "discard"
git reset --hard HEAD~1
ELIF crashed:
# Attempt fix (max 3 tries)
IF fixable:
Fix -> re-commit -> re-verify
ELSE:
STATUS = "crash"
git reset --hard HEAD~1
```
**Simplicity override:** If metric barely improved (+<0.1%) but change adds significant complexity, treat as "discard". If metric unchanged but code is simpler, treat as "keep".
## Phase 7: Log Results
Append to results log (TSV format):
```
iteration commit metric status description commander_task_id
42 a1b2c3d 0.9821 keep increase attention heads from 8 to 12 184
43 - 0.9845 discard switch optimizer to SGD 185
44 - 0.0000 crash double batch size (OOM) 186
```
### Commander: Complete Task + Add Comment
After logging, complete the Commander task and add a detailed comment:
```
commander_task { operation: "complete", task_id: <task_id>, result: "<keep|discard|crash>: <description>. Metric: <old> → <new> (delta: <delta>)" }
commander_task { operation: "comment:add", task_id: <task_id>, body: "Status: <status>\nMetric: <value> (delta: <delta>)\nCommit: <hash or '-'>\nDescription: <what was tried>", agent_name: "autoresearch" }
```
Use `complete` for all outcomes — discards and crashes are expected in autoresearch, not failures.
### Session Save
On every "keep" result or every ~5 iterations, update the research session file (`.context/research-sessions/<session-id>.json`):
- Append to the `iterations` array
- Update `metric.final` with the current best metric value
## Phase 8: Repeat
### Commander: Status Broadcast Every ~5 Iterations
Every 5 iterations, send a mailbox status update so the Commander dashboard shows live progress:
```
commander_mailbox {
operation: "send",
from_agent: "autoresearch",
to_agent: "commander",
body: "Autoresearch progress — Iteration #N: metric at <value> (baseline: <baseline>). Keeps: X | Discards: Y | Crashes: Z",
message_type: "status"
}
```
### Unbounded Mode (default)
Go to Phase 1. **NEVER STOP. NEVER ASK IF YOU SHOULD CONTINUE.**
### Bounded Mode (with iteration count)
```
IF current_iteration < max_iterations:
Go to Phase 1
ELIF goal_achieved:
Print: "Goal achieved at iteration {N}! Final metric: {value}"
Print final summary
STOP
ELSE:
Print final summary
STOP
```
**Final summary format:**
```
=== Autoresearch Complete (N/N iterations) ===
Baseline: {baseline} -> Final: {current} ({delta})
Keeps: X | Discards: Y | Crashes: Z
Best iteration: #{n} — {description}
```
### Commander: Final Broadcast + Completion Report (MANDATORY)
When the loop ends (bounded or goal achieved), ALL three steps are required:
1. Send a final mailbox result broadcast:
```
commander_mailbox {
operation: "send",
from_agent: "autoresearch",
to_agent: "commander",
body: "Autoresearch complete (N iterations). Baseline: <X> → Final: <Y> (delta: <Z>). Keeps: A | Discards: B | Crashes: C. Best: #M — <description>",
message_type: "result"
}
```
2. Call `show_report` to open the visual completion report with a rich summary that ties back to the research plan:
```
show_report {
title: "Autoresearch Complete: <goal>",
summary: "## Results\n\nBaseline: <X> → Final: <Y> (delta: <Z>)\n\n**Iterations:** N total (A keeps, B discards, C crashes)\n\n**Best:** #M — <description>\n\n## Plan vs. Reality\n\nReference `.context/autoresearch-plan.md` — which planned strategies were tried? Which worked? What was surprising?\n\n## Kept Changes\n\n<list of kept iterations with descriptions>\n\n## What Didn't Work\n\n<Discarded approaches and why>"
}
```
3. Preserve `.context/autoresearch-plan.md` as a record of the research session. Do not delete it.
### Implementation Handoff
After the completion report, the autoresearch process continues:
1. **Compile findings** — Extract prioritized next steps from the research results
2. **Update session** — Save findings and next steps to the session file
3. **Ask user** — Offer to implement now (spawn team), save & pause, or mark done
4. **If implementing** — Dispatch builder agents via `subagent_create_batch`, track via Commander, present final report when done
5. **Save session** — Update status to "complete" with implementation summary
See the main SKILL.md or autoresearch.md for full implementation handoff instructions.
### When Stuck (>5 consecutive discards)
Applies to both modes:
1. Re-read ALL in-scope files from scratch
2. Re-read the original goal AND `.context/autoresearch-plan.md` for planned strategy
3. Review entire results log for patterns
4. Try the next untried approach from the plan's Strategy section
5. Try combining 2-3 previously successful changes
6. Try the OPPOSITE of what hasn't been working
7. Try a radical architectural change
## Crash Recovery
- Syntax error: fix immediately, don't count as separate iteration
- Runtime error: attempt fix (max 3 tries), then move on
- Resource exhaustion (OOM): revert, try smaller variant
- Infinite loop/hang: kill after timeout, revert, avoid that approach
- External dependency failure: skip, log, try different approach
## Communication
- **DO NOT** ask "should I keep going?" — in unbounded mode, YES. ALWAYS. In bounded mode, continue until N is reached.
- **DO NOT** summarize after each iteration — just log and continue
- **DO** print a brief one-line status every ~5 iterations (e.g., "Iteration 25: metric at 0.95, 8 keeps / 17 discards")
- **DO** alert if you discover something surprising or game-changing
- **DO** print a final summary when bounded loop completes
- **DO** send Commander mailbox status broadcasts every ~5 iterations
- **DO** add comments to Commander tasks with detailed iteration results
- **DO** send a final Commander mailbox result broadcast when the loop ends
- **DO** call `show_report` at loop end for a visual completion report
## Commander Graceful Degradation
All Commander integration (task tracking, mailbox broadcasts, comments) is **optional**. If Commander is unavailable:
- Skip all `commander_task` and `commander_mailbox` calls silently
- Never let a Commander error interrupt or break the autonomous loop
- The local `autoresearch-results.tsv` remains the primary record
- `show_report` still works without Commander (it only needs git)

View File

@@ -0,0 +1,92 @@
# Core Principles — From Karpathy's Autoresearch
7 universal principles extracted from autoresearch, applicable to ANY autonomous work.
## 1. Constraint = Enabler
Autonomy succeeds through intentional constraint, not despite it.
| Autoresearch | Generalized |
|--------------|-------------|
| 630-line codebase | Bounded scope that fits agent context |
| 5-minute time budget | Fixed iteration cost |
| One metric (val_bpb) | Single mechanical success criterion |
**Why:** Constraints enable agent confidence (full context understood), verification simplicity (no ambiguity), iteration velocity (low cost = rapid feedback loops).
**Apply:** Before starting, define: what files are in-scope? What's the ONE metric? What's the time budget per iteration?
## 2. Separate Strategy from Tactics
Humans set direction. Agents execute iterations.
| Strategic (Human) | Tactical (Agent) |
|-------------------|------------------|
| "Improve page load speed" | "Lazy-load images, code-split routes" |
| "Increase test coverage" | "Add tests for uncovered edge cases" |
| "Refactor auth module" | "Extract middleware, simplify handlers" |
**Why:** Humans understand WHY. Agents handle HOW. Mixing these roles wastes both human creativity and agent iteration speed.
**Apply:** Get clear direction from user (or project docs). Then iterate autonomously on implementation.
## 3. Metrics Must Be Mechanical
If you can't verify with a command, you can't iterate autonomously.
- Tests pass/fail (exit code 0)
- Benchmark time in milliseconds
- Coverage percentage
- Lighthouse score
- File size in bytes
- Lines of code count
**Anti-pattern:** "Looks better", "probably improved", "seems cleaner" — these KILL autonomous loops because there's no decision function.
**Apply:** Define the grep command (or equivalent) that extracts your metric BEFORE starting.
## 4. Verification Must Be Fast
If verification takes longer than the work itself, incentives misalign.
| Fast (enables iteration) | Slow (kills iteration) |
|-------------------------|----------------------|
| Unit tests (seconds) | Full E2E suite (minutes) |
| Type check (seconds) | Manual QA (hours) |
| Lint check (instant) | Code review (async) |
**Apply:** Use the FASTEST verification that still catches real problems. Save slow verification for after the loop.
## 5. Iteration Cost Shapes Behavior
- Cheap iteration: bold exploration, many experiments
- Expensive iteration: conservative, few experiments
Autoresearch: 5-minute cost = 100 experiments/night.
Software: 10-second test = 360 experiments/hour.
**Apply:** Minimize iteration cost. Use fast tests, incremental builds, targeted verification. Every minute saved = more experiments run.
## 6. Git as Memory and Audit Trail
Every successful change is committed. This enables:
- **Causality tracking** — which change drove improvement?
- **Stacking wins** — each commit builds on prior successes
- **Pattern learning** — agent sees what worked in THIS codebase
- **Human review** — researcher inspects agent's decision sequence
**Apply:** Commit before verify. Revert on failure. Agent reads its own git history to inform next experiment.
## 7. Honest Limitations
State what the system can and cannot do. Don't oversell.
Autoresearch CANNOT: change tokenizer, replace human direction, guarantee meaningful improvements.
**Apply:** At setup, explicitly state constraints. If agent hits a wall it can't solve (missing permissions, external dependency, needs human judgment), say so clearly instead of guessing.
## The Meta-Principle
> Autonomy scales when you constrain scope, clarify success, mechanize verification, and let agents optimize tactics while humans optimize strategy.
This isn't "removing humans." It's reassigning human effort from execution to direction. Humans become MORE valuable by focusing on irreducibly creative/strategic work.

View File

@@ -0,0 +1,65 @@
# Results Logging Protocol
Track every iteration in a structured log. Enables pattern recognition and prevents repeating failed experiments.
## Log Format (TSV)
Create `autoresearch-results.tsv` in the working directory (gitignored):
```tsv
iteration commit metric delta status description commander_task_id
```
### Columns
| Column | Type | Description |
|--------|------|-------------|
| iteration | int | Sequential counter starting at 0 (baseline) |
| commit | string | Short git hash (7 chars), "-" if reverted |
| metric | float | Measured value from verification |
| delta | float | Change from previous best (negative = improved for "lower is better") |
| status | enum | `baseline`, `keep`, `discard`, `crash` |
| description | string | One-sentence description of what was tried |
| commander_task_id | int | Commander task ID for this iteration, "-" if Commander unavailable |
### Example
```tsv
iteration commit metric delta status description commander_task_id
0 a1b2c3d 85.2 0.0 baseline initial state — test coverage 85.2% -
1 b2c3d4e 87.1 +1.9 keep add tests for auth middleware edge cases 184
2 - 86.5 -0.6 discard refactor test helpers (broke 2 tests) 185
3 - 0.0 0.0 crash add integration tests (DB connection failed) 186
4 c3d4e5f 88.3 +1.2 keep add tests for error handling in API routes 187
5 d4e5f6g 89.0 +0.7 keep add boundary value tests for validators 188
```
## Log Management
- Create at setup (iteration 0 = baseline)
- Append after EVERY iteration (including crashes)
- Do NOT commit this file to git (add to .gitignore)
- Read last 10-20 entries at start of each iteration for context
- Use to detect patterns: what kind of changes tend to succeed?
## Summary Reporting
Every 10 iterations (or at loop completion in bounded mode), print a brief summary:
```
=== Autoresearch Progress (iteration 20) ===
Baseline: 85.2% -> Current best: 92.1% (+6.9%)
Keeps: 8 | Discards: 10 | Crashes: 2
Last 5: keep, discard, discard, keep, keep
```
## Metric Direction
Clarify at setup whether lower or higher is better:
- **Lower is better:** val_bpb, response time (ms), bundle size (KB), error count
- **Higher is better:** test coverage (%), lighthouse score, throughput (req/s)
Record direction in first line of results log as a comment:
```
# metric_direction: higher_is_better
```