Files
emdash-patch-imageupload/infra/perf-monitor/README.md
kunthawat 2d1be52177 Emdash source with visual editor image upload fix
Fixes:
1. media.ts: wrap placeholder generation in try-catch
2. toolbar.ts: check r.ok, display error message in popover
2026-05-03 10:44:54 +07:00

143 lines
7.9 KiB
Markdown

# Emdash Perf Monitor
Tracks cold start / TTFB of the emdash demo sites over time from multiple regions. Two sites are measured in parallel so the effect of Astro's experimental cache provider can be compared head-to-head:
- `blog` -- `blog-demo.emdashcms.com` (baseline, catalog Astro)
- `cache` -- `cache-demo.emdashcms.com` (prerelease Astro with `cacheCloudflare()` enabled)
Each measurement row is tagged with a `site` column matching one of those ids.
## Architecture
- **Coordinator Worker** (`emdash-perf-coordinator`) owns the D1 database, cron trigger, queue consumer, HTTP API, and frontend dashboard. Served at `https://perf.emdashcms.com`.
- **4 Probe Workers** (`emdash-perf-probe-{use,euw,ape,aps}`) are placed near AWS regions via `placement.region`. They receive measurement requests from the coordinator via service bindings and run `fetch()` timing from their placed location.
- **D1 database** (`emdash_perf`) stores all measurements, tagged by `source`: `deploy` (queue-triggered, has SHA + PR) or `cron` (ambient baseline, untagged).
- **Cloudflare Queue** (`emdash-perf-deploy-events`) subscribes to `cf.workersBuilds.worker.build.succeeded` events. The coordinator consumes these, filters for the baseline demo Worker, resolves the PR via the GitHub API, and runs a measurement against every registered site. This is the primary attribution path; see `src/routes.ts` for the site registry.
All five Workers are built from this directory by the Cloudflare Vite plugin -- the coordinator entry is `src/index.ts` and the four probes are defined as `auxiliaryWorkers` in `vite.config.ts`.
## Measurement triggers
| Trigger | When | `source` | Sites | SHA | PR | On graph? | Persisted? |
| ---------------------- | ---------------------------------- | -------- | ------------ | ---------- | -------- | --------- | ---------- |
| Queue event | Every successful `blog-demo` build | `deploy` | all | from event | resolved | yes | yes |
| Cron (`*/30 * * * *`) | Every 30 min | `cron` | all | null | null | yes | yes |
| `pnpm trigger` | Private/quiet check (default) | n/a | all (or one) | n/a | n/a | no | **no** |
| `pnpm trigger --store` | Manual, persisted | `manual` | all (or one) | optional | optional | **no** | yes |
The queue is the deploy-attribution path. The cron is a safety net that fills gaps between deploys and catches regressions the queue might miss.
`pnpm trigger` defaults to ephemeral: the probes run for real, but the coordinator skips the database insert and just returns the results to stdout. Use this for private/local checks you don't want on the dashboard.
Passing `--store` persists the run as `source=manual`. Stored manual runs land in the results table with a yellow `manual` badge but are excluded from the line chart, the summary cards, and the 7-day rolling medians so they don't skew the baseline.
## Manual triggers
```bash
# Default: run the probes, print results, record nothing.
# First invocation opens a browser for Cloudflare Access login; subsequent
# invocations reuse the token until the Access session expires.
pnpm trigger
# Persist the run as source=manual (appears in the results table)
pnpm trigger -- --store --note "pre-cold-start-fix baseline"
# Attach a SHA and/or PR number to a persisted run
pnpm trigger -- --store --sha 1a2b3c4 --pr 532 --note "PR #532 preview"
```
Auth is handled by a Cloudflare Access policy on `POST /api/trigger`
## First-time setup
```bash
# 1. Create the D1 database and apply the initial schema
wrangler d1 create emdash_perf
# copy the database_id into wrangler.jsonc
wrangler d1 execute emdash_perf --remote --file=schema.sql
pnpm db:migrations:apply # any incremental migrations on top
# 2. Create the deploy events queue and DLQ
wrangler queues create emdash-perf-deploy-events
wrangler queues create emdash-perf-deploy-events-dlq
# 3. Build and deploy all 5 Workers
pnpm deploy
# 4. Subscribe the queue to Workers Builds events.
# (No wrangler command for this yet -- use the CF dashboard or API:
# https://developers.cloudflare.com/queues/event-subscriptions/manage-event-subscriptions/)
# Source: Workers Builds
# Events: build.succeeded (at minimum)
# Queue: emdash-perf-deploy-events
# 5. (Optional, to enable manual triggers) Add a Cloudflare Access policy
# on POST /api/trigger. See "Access setup" above.
```
No secrets required. PR lookup hits the public GitHub API unauthenticated
(60 req/hr limit, plenty for one lookup per deploy).
## Deploy order
The coordinator's service bindings require the probes to exist first. `pnpm deploy` handles this: it builds, deploys all 4 probes, then deploys the coordinator.
## Dev
```bash
pnpm dev # Vite dev server, all 5 Workers via Miniflare
```
Open `http://localhost:5173` for the dashboard. API is at `/api/*`. Queue events can't be exercised locally without manual message publishing -- rely on the live environment or the next cron tick to verify the measurement path.
Local manual trigger (no Access locally):
```bash
curl -sS -X POST http://localhost:5173/api/trigger \
-H 'content-type: application/json' \
-d '{"note":"local test"}'
```
## Endpoints
| Endpoint | Method | Auth | Purpose |
| -------------- | ------ | --------- | ------------------------------------------------- |
| `/` | GET | none | Dashboard |
| `/api/config` | GET | none | Target URL, available routes and regions |
| `/api/summary` | GET | none | Latest result per route/region + rolling medians |
| `/api/results` | GET | none | Filtered historical results |
| `/api/chart` | GET | none | Time series for charting (with PR markers) |
| `/api/trigger` | POST | CF Access | Run an ad-hoc measurement, tagged `source=manual` |
All GET endpoints are read-only. `POST /api/trigger` is the only state-changing endpoint and is expected to be protected by a Cloudflare Access policy at the edge.
## Schema changes
D1's native migrations are wired up (`migrations_dir` in `wrangler.jsonc`).
```bash
pnpm db:migrations:list # show pending migrations
pnpm db:migrations:apply # apply pending migrations
pnpm db:migrations:create # scaffold a new migration file
```
`schema.sql` is the desired end state for fresh installs only. For incremental changes on an existing database, add a file under `migrations/` and apply it -- don't rely on editing `schema.sql` to take effect.
## Types
Binding types come from `wrangler types`, which reads `wrangler.jsonc` and writes `worker-configuration.d.ts`. The generated file is committed so `tsc` doesn't need wrangler to run first.
Re-run after any binding change:
```bash
pnpm cf-typegen
```
## Operational notes
- **Trigger worker name**: `TRIGGER_WORKER_NAME` in `src/routes.ts` is the Worker whose `build.succeeded` event drives deploy-attributed runs. Events for any other Worker are discarded (the cron job still measures every site on its own schedule). Since every registered site rebuilds from the same main-branch commit, one event triggers a measurement for all of them. If the baseline demo is ever renamed, update this constant.
- **Adding a site**: add an entry to `SITES` in `src/routes.ts` with a stable `id` (stored in `perf_results.site`), `targetUrl`, and Worker name. Existing rows continue to use their recorded site id.
- **PR lookup**: hits the public GitHub API unauthenticated (60 req/hr per IP). One call per deploy, so rate limits are a non-issue. If deploy rate ever gets anywhere near that, add a fine-grained PAT via `wrangler secret put GITHUB_TOKEN` and pass it in `src/github.ts`.
- **DLQ**: failed messages retry 3x, then go to `emdash-perf-deploy-events-dlq`. Check this periodically if deploy-attributed results stop appearing.