Initial: pi-skill — 68 skills, 43 extensions, 11 themes for Pi
This commit is contained in:
65
skills/autoresearch/references/results-logging.md
Normal file
65
skills/autoresearch/references/results-logging.md
Normal file
@@ -0,0 +1,65 @@
|
||||
# Results Logging Protocol
|
||||
|
||||
Track every iteration in a structured log. Enables pattern recognition and prevents repeating failed experiments.
|
||||
|
||||
## Log Format (TSV)
|
||||
|
||||
Create `autoresearch-results.tsv` in the working directory (gitignored):
|
||||
|
||||
```tsv
|
||||
iteration commit metric delta status description commander_task_id
|
||||
```
|
||||
|
||||
### Columns
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| iteration | int | Sequential counter starting at 0 (baseline) |
|
||||
| commit | string | Short git hash (7 chars), "-" if reverted |
|
||||
| metric | float | Measured value from verification |
|
||||
| delta | float | Change from previous best (negative = improved for "lower is better") |
|
||||
| status | enum | `baseline`, `keep`, `discard`, `crash` |
|
||||
| description | string | One-sentence description of what was tried |
|
||||
| commander_task_id | int | Commander task ID for this iteration, "-" if Commander unavailable |
|
||||
|
||||
### Example
|
||||
|
||||
```tsv
|
||||
iteration commit metric delta status description commander_task_id
|
||||
0 a1b2c3d 85.2 0.0 baseline initial state — test coverage 85.2% -
|
||||
1 b2c3d4e 87.1 +1.9 keep add tests for auth middleware edge cases 184
|
||||
2 - 86.5 -0.6 discard refactor test helpers (broke 2 tests) 185
|
||||
3 - 0.0 0.0 crash add integration tests (DB connection failed) 186
|
||||
4 c3d4e5f 88.3 +1.2 keep add tests for error handling in API routes 187
|
||||
5 d4e5f6g 89.0 +0.7 keep add boundary value tests for validators 188
|
||||
```
|
||||
|
||||
## Log Management
|
||||
|
||||
- Create at setup (iteration 0 = baseline)
|
||||
- Append after EVERY iteration (including crashes)
|
||||
- Do NOT commit this file to git (add to .gitignore)
|
||||
- Read last 10-20 entries at start of each iteration for context
|
||||
- Use to detect patterns: what kind of changes tend to succeed?
|
||||
|
||||
## Summary Reporting
|
||||
|
||||
Every 10 iterations (or at loop completion in bounded mode), print a brief summary:
|
||||
|
||||
```
|
||||
=== Autoresearch Progress (iteration 20) ===
|
||||
Baseline: 85.2% -> Current best: 92.1% (+6.9%)
|
||||
Keeps: 8 | Discards: 10 | Crashes: 2
|
||||
Last 5: keep, discard, discard, keep, keep
|
||||
```
|
||||
|
||||
## Metric Direction
|
||||
|
||||
Clarify at setup whether lower or higher is better:
|
||||
- **Lower is better:** val_bpb, response time (ms), bundle size (KB), error count
|
||||
- **Higher is better:** test coverage (%), lighthouse score, throughput (req/s)
|
||||
|
||||
Record direction in first line of results log as a comment:
|
||||
```
|
||||
# metric_direction: higher_is_better
|
||||
```
|
||||
Reference in New Issue
Block a user