Initial: pi-skill — 68 skills, 43 extensions, 11 themes for Pi

This commit is contained in:
Kunthawat Greethong
2026-05-25 16:38:02 +07:00
commit 69f7d8bdda
1689 changed files with 342427 additions and 0 deletions

View File

@@ -0,0 +1,246 @@
# Autonomous Loop Protocol
Detailed protocol for the autoresearch iteration loop. SKILL.md has the summary; this file has the full rules.
## Loop Modes
Autoresearch supports two loop modes:
- **Unbounded (default):** Loop forever until manually interrupted (Ctrl+C)
- **Bounded:** Loop exactly N times when user specifies a count
When bounded, track `current_iteration` against `max_iterations`. After the final iteration, print a summary and stop.
## Phase 1: Review (30 seconds)
Before each iteration, build situational awareness:
```
1. Read current state of in-scope files (full context)
2. Read last 10-20 entries from results log
3. Read git log --oneline -20 to see recent changes
4. Re-read `.context/autoresearch-plan.md` Strategy section for planned approaches
5. Identify: what worked, what failed, what's untried from the plan
6. If bounded: check current_iteration vs max_iterations
```
**Why read every time?** After rollbacks, state may differ from what you expect. Never assume — always verify.
## Phase 2: Ideate (Strategic)
Pick the NEXT change. Priority order:
1. **Fix crashes/failures** from previous iteration first
2. **Exploit successes** — if last change improved metric, try variants in same direction
3. **Explore new approaches** — try something the results log shows hasn't been attempted
4. **Combine near-misses** — two changes that individually didn't help might work together
5. **Simplify** — remove code while maintaining metric. Simpler = better
6. **Radical experiments** — when incremental changes stall, try something dramatically different
**Anti-patterns:**
- Don't repeat exact same change that was already discarded
- Don't make multiple unrelated changes at once (can't attribute improvement)
- Don't chase marginal gains with ugly complexity
**Bounded mode consideration:** If remaining iterations are limited (<3 left), prioritize exploiting successes over exploration.
### Commander: Create + Claim Task
After ideating the next change, create a Commander task and claim it:
```
commander_task { operation: "create", description: "Iteration #N: <planned change>", working_directory: "<cwd>", group_id: <group_id> }
commander_task { operation: "claim", task_id: <returned_task_id>, agent_name: "autoresearch" }
```
This makes the iteration visible on the Commander dashboard as an active task.
## Phase 3: Modify (One Atomic Change)
- Make ONE focused change to in-scope files
- The change should be explainable in one sentence
- Write the description BEFORE making the change (forces clarity)
## Phase 4: Commit (Before Verification)
```bash
git add <changed-files>
git commit -m "experiment: <one-sentence description>"
```
Commit BEFORE running verification so rollback is clean: `git reset --hard HEAD~1`
## Phase 5: Verify (Mechanical Only)
Run the agreed-upon verification command. Capture output.
**Timeout rule:** If verification exceeds 2x normal time, kill and treat as crash.
**Extract metric:** Parse the verification output for the specific metric number.
## Phase 6: Decide (No Ambiguity)
```
IF metric_improved:
STATUS = "keep"
# Do nothing — commit stays
ELIF metric_same_or_worse:
STATUS = "discard"
git reset --hard HEAD~1
ELIF crashed:
# Attempt fix (max 3 tries)
IF fixable:
Fix -> re-commit -> re-verify
ELSE:
STATUS = "crash"
git reset --hard HEAD~1
```
**Simplicity override:** If metric barely improved (+<0.1%) but change adds significant complexity, treat as "discard". If metric unchanged but code is simpler, treat as "keep".
## Phase 7: Log Results
Append to results log (TSV format):
```
iteration commit metric status description commander_task_id
42 a1b2c3d 0.9821 keep increase attention heads from 8 to 12 184
43 - 0.9845 discard switch optimizer to SGD 185
44 - 0.0000 crash double batch size (OOM) 186
```
### Commander: Complete Task + Add Comment
After logging, complete the Commander task and add a detailed comment:
```
commander_task { operation: "complete", task_id: <task_id>, result: "<keep|discard|crash>: <description>. Metric: <old> → <new> (delta: <delta>)" }
commander_task { operation: "comment:add", task_id: <task_id>, body: "Status: <status>\nMetric: <value> (delta: <delta>)\nCommit: <hash or '-'>\nDescription: <what was tried>", agent_name: "autoresearch" }
```
Use `complete` for all outcomes — discards and crashes are expected in autoresearch, not failures.
### Session Save
On every "keep" result or every ~5 iterations, update the research session file (`.context/research-sessions/<session-id>.json`):
- Append to the `iterations` array
- Update `metric.final` with the current best metric value
## Phase 8: Repeat
### Commander: Status Broadcast Every ~5 Iterations
Every 5 iterations, send a mailbox status update so the Commander dashboard shows live progress:
```
commander_mailbox {
operation: "send",
from_agent: "autoresearch",
to_agent: "commander",
body: "Autoresearch progress — Iteration #N: metric at <value> (baseline: <baseline>). Keeps: X | Discards: Y | Crashes: Z",
message_type: "status"
}
```
### Unbounded Mode (default)
Go to Phase 1. **NEVER STOP. NEVER ASK IF YOU SHOULD CONTINUE.**
### Bounded Mode (with iteration count)
```
IF current_iteration < max_iterations:
Go to Phase 1
ELIF goal_achieved:
Print: "Goal achieved at iteration {N}! Final metric: {value}"
Print final summary
STOP
ELSE:
Print final summary
STOP
```
**Final summary format:**
```
=== Autoresearch Complete (N/N iterations) ===
Baseline: {baseline} -> Final: {current} ({delta})
Keeps: X | Discards: Y | Crashes: Z
Best iteration: #{n} — {description}
```
### Commander: Final Broadcast + Completion Report (MANDATORY)
When the loop ends (bounded or goal achieved), ALL three steps are required:
1. Send a final mailbox result broadcast:
```
commander_mailbox {
operation: "send",
from_agent: "autoresearch",
to_agent: "commander",
body: "Autoresearch complete (N iterations). Baseline: <X> → Final: <Y> (delta: <Z>). Keeps: A | Discards: B | Crashes: C. Best: #M — <description>",
message_type: "result"
}
```
2. Call `show_report` to open the visual completion report with a rich summary that ties back to the research plan:
```
show_report {
title: "Autoresearch Complete: <goal>",
summary: "## Results\n\nBaseline: <X> → Final: <Y> (delta: <Z>)\n\n**Iterations:** N total (A keeps, B discards, C crashes)\n\n**Best:** #M — <description>\n\n## Plan vs. Reality\n\nReference `.context/autoresearch-plan.md` — which planned strategies were tried? Which worked? What was surprising?\n\n## Kept Changes\n\n<list of kept iterations with descriptions>\n\n## What Didn't Work\n\n<Discarded approaches and why>"
}
```
3. Preserve `.context/autoresearch-plan.md` as a record of the research session. Do not delete it.
### Implementation Handoff
After the completion report, the autoresearch process continues:
1. **Compile findings** — Extract prioritized next steps from the research results
2. **Update session** — Save findings and next steps to the session file
3. **Ask user** — Offer to implement now (spawn team), save & pause, or mark done
4. **If implementing** — Dispatch builder agents via `subagent_create_batch`, track via Commander, present final report when done
5. **Save session** — Update status to "complete" with implementation summary
See the main SKILL.md or autoresearch.md for full implementation handoff instructions.
### When Stuck (>5 consecutive discards)
Applies to both modes:
1. Re-read ALL in-scope files from scratch
2. Re-read the original goal AND `.context/autoresearch-plan.md` for planned strategy
3. Review entire results log for patterns
4. Try the next untried approach from the plan's Strategy section
5. Try combining 2-3 previously successful changes
6. Try the OPPOSITE of what hasn't been working
7. Try a radical architectural change
## Crash Recovery
- Syntax error: fix immediately, don't count as separate iteration
- Runtime error: attempt fix (max 3 tries), then move on
- Resource exhaustion (OOM): revert, try smaller variant
- Infinite loop/hang: kill after timeout, revert, avoid that approach
- External dependency failure: skip, log, try different approach
## Communication
- **DO NOT** ask "should I keep going?" — in unbounded mode, YES. ALWAYS. In bounded mode, continue until N is reached.
- **DO NOT** summarize after each iteration — just log and continue
- **DO** print a brief one-line status every ~5 iterations (e.g., "Iteration 25: metric at 0.95, 8 keeps / 17 discards")
- **DO** alert if you discover something surprising or game-changing
- **DO** print a final summary when bounded loop completes
- **DO** send Commander mailbox status broadcasts every ~5 iterations
- **DO** add comments to Commander tasks with detailed iteration results
- **DO** send a final Commander mailbox result broadcast when the loop ends
- **DO** call `show_report` at loop end for a visual completion report
## Commander Graceful Degradation
All Commander integration (task tracking, mailbox broadcasts, comments) is **optional**. If Commander is unavailable:
- Skip all `commander_task` and `commander_mailbox` calls silently
- Never let a Commander error interrupt or break the autonomous loop
- The local `autoresearch-results.tsv` remains the primary record
- `show_report` still works without Commander (it only needs git)

View File

@@ -0,0 +1,92 @@
# Core Principles — From Karpathy's Autoresearch
7 universal principles extracted from autoresearch, applicable to ANY autonomous work.
## 1. Constraint = Enabler
Autonomy succeeds through intentional constraint, not despite it.
| Autoresearch | Generalized |
|--------------|-------------|
| 630-line codebase | Bounded scope that fits agent context |
| 5-minute time budget | Fixed iteration cost |
| One metric (val_bpb) | Single mechanical success criterion |
**Why:** Constraints enable agent confidence (full context understood), verification simplicity (no ambiguity), iteration velocity (low cost = rapid feedback loops).
**Apply:** Before starting, define: what files are in-scope? What's the ONE metric? What's the time budget per iteration?
## 2. Separate Strategy from Tactics
Humans set direction. Agents execute iterations.
| Strategic (Human) | Tactical (Agent) |
|-------------------|------------------|
| "Improve page load speed" | "Lazy-load images, code-split routes" |
| "Increase test coverage" | "Add tests for uncovered edge cases" |
| "Refactor auth module" | "Extract middleware, simplify handlers" |
**Why:** Humans understand WHY. Agents handle HOW. Mixing these roles wastes both human creativity and agent iteration speed.
**Apply:** Get clear direction from user (or project docs). Then iterate autonomously on implementation.
## 3. Metrics Must Be Mechanical
If you can't verify with a command, you can't iterate autonomously.
- Tests pass/fail (exit code 0)
- Benchmark time in milliseconds
- Coverage percentage
- Lighthouse score
- File size in bytes
- Lines of code count
**Anti-pattern:** "Looks better", "probably improved", "seems cleaner" — these KILL autonomous loops because there's no decision function.
**Apply:** Define the grep command (or equivalent) that extracts your metric BEFORE starting.
## 4. Verification Must Be Fast
If verification takes longer than the work itself, incentives misalign.
| Fast (enables iteration) | Slow (kills iteration) |
|-------------------------|----------------------|
| Unit tests (seconds) | Full E2E suite (minutes) |
| Type check (seconds) | Manual QA (hours) |
| Lint check (instant) | Code review (async) |
**Apply:** Use the FASTEST verification that still catches real problems. Save slow verification for after the loop.
## 5. Iteration Cost Shapes Behavior
- Cheap iteration: bold exploration, many experiments
- Expensive iteration: conservative, few experiments
Autoresearch: 5-minute cost = 100 experiments/night.
Software: 10-second test = 360 experiments/hour.
**Apply:** Minimize iteration cost. Use fast tests, incremental builds, targeted verification. Every minute saved = more experiments run.
## 6. Git as Memory and Audit Trail
Every successful change is committed. This enables:
- **Causality tracking** — which change drove improvement?
- **Stacking wins** — each commit builds on prior successes
- **Pattern learning** — agent sees what worked in THIS codebase
- **Human review** — researcher inspects agent's decision sequence
**Apply:** Commit before verify. Revert on failure. Agent reads its own git history to inform next experiment.
## 7. Honest Limitations
State what the system can and cannot do. Don't oversell.
Autoresearch CANNOT: change tokenizer, replace human direction, guarantee meaningful improvements.
**Apply:** At setup, explicitly state constraints. If agent hits a wall it can't solve (missing permissions, external dependency, needs human judgment), say so clearly instead of guessing.
## The Meta-Principle
> Autonomy scales when you constrain scope, clarify success, mechanize verification, and let agents optimize tactics while humans optimize strategy.
This isn't "removing humans." It's reassigning human effort from execution to direction. Humans become MORE valuable by focusing on irreducibly creative/strategic work.

View File

@@ -0,0 +1,65 @@
# Results Logging Protocol
Track every iteration in a structured log. Enables pattern recognition and prevents repeating failed experiments.
## Log Format (TSV)
Create `autoresearch-results.tsv` in the working directory (gitignored):
```tsv
iteration commit metric delta status description commander_task_id
```
### Columns
| Column | Type | Description |
|--------|------|-------------|
| iteration | int | Sequential counter starting at 0 (baseline) |
| commit | string | Short git hash (7 chars), "-" if reverted |
| metric | float | Measured value from verification |
| delta | float | Change from previous best (negative = improved for "lower is better") |
| status | enum | `baseline`, `keep`, `discard`, `crash` |
| description | string | One-sentence description of what was tried |
| commander_task_id | int | Commander task ID for this iteration, "-" if Commander unavailable |
### Example
```tsv
iteration commit metric delta status description commander_task_id
0 a1b2c3d 85.2 0.0 baseline initial state — test coverage 85.2% -
1 b2c3d4e 87.1 +1.9 keep add tests for auth middleware edge cases 184
2 - 86.5 -0.6 discard refactor test helpers (broke 2 tests) 185
3 - 0.0 0.0 crash add integration tests (DB connection failed) 186
4 c3d4e5f 88.3 +1.2 keep add tests for error handling in API routes 187
5 d4e5f6g 89.0 +0.7 keep add boundary value tests for validators 188
```
## Log Management
- Create at setup (iteration 0 = baseline)
- Append after EVERY iteration (including crashes)
- Do NOT commit this file to git (add to .gitignore)
- Read last 10-20 entries at start of each iteration for context
- Use to detect patterns: what kind of changes tend to succeed?
## Summary Reporting
Every 10 iterations (or at loop completion in bounded mode), print a brief summary:
```
=== Autoresearch Progress (iteration 20) ===
Baseline: 85.2% -> Current best: 92.1% (+6.9%)
Keeps: 8 | Discards: 10 | Crashes: 2
Last 5: keep, discard, discard, keep, keep
```
## Metric Direction
Clarify at setup whether lower or higher is better:
- **Lower is better:** val_bpb, response time (ms), bundle size (KB), error count
- **Higher is better:** test coverage (%), lighthouse score, throughput (req/s)
Record direction in first line of results log as a comment:
```
# metric_direction: higher_is_better
```