Files
emdash-patch-imageupload/infra/perf-monitor

Emdash Perf Monitor

Tracks cold start / TTFB of the emdash demo sites over time from multiple regions. Two sites are measured in parallel so the effect of Astro's experimental cache provider can be compared head-to-head:

  • blog -- blog-demo.emdashcms.com (baseline, catalog Astro)
  • cache -- cache-demo.emdashcms.com (prerelease Astro with cacheCloudflare() enabled)

Each measurement row is tagged with a site column matching one of those ids.

Architecture

  • Coordinator Worker (emdash-perf-coordinator) owns the D1 database, cron trigger, queue consumer, HTTP API, and frontend dashboard. Served at https://perf.emdashcms.com.
  • 4 Probe Workers (emdash-perf-probe-{use,euw,ape,aps}) are placed near AWS regions via placement.region. They receive measurement requests from the coordinator via service bindings and run fetch() timing from their placed location.
  • D1 database (emdash_perf) stores all measurements, tagged by source: deploy (queue-triggered, has SHA + PR) or cron (ambient baseline, untagged).
  • Cloudflare Queue (emdash-perf-deploy-events) subscribes to cf.workersBuilds.worker.build.succeeded events. The coordinator consumes these, filters for the baseline demo Worker, resolves the PR via the GitHub API, and runs a measurement against every registered site. This is the primary attribution path; see src/routes.ts for the site registry.

All five Workers are built from this directory by the Cloudflare Vite plugin -- the coordinator entry is src/index.ts and the four probes are defined as auxiliaryWorkers in vite.config.ts.

Measurement triggers

Trigger When source Sites SHA PR On graph? Persisted?
Queue event Every successful blog-demo build deploy all from event resolved yes yes
Cron (*/30 * * * *) Every 30 min cron all null null yes yes
pnpm trigger Private/quiet check (default) n/a all (or one) n/a n/a no no
pnpm trigger --store Manual, persisted manual all (or one) optional optional no yes

The queue is the deploy-attribution path. The cron is a safety net that fills gaps between deploys and catches regressions the queue might miss.

pnpm trigger defaults to ephemeral: the probes run for real, but the coordinator skips the database insert and just returns the results to stdout. Use this for private/local checks you don't want on the dashboard.

Passing --store persists the run as source=manual. Stored manual runs land in the results table with a yellow manual badge but are excluded from the line chart, the summary cards, and the 7-day rolling medians so they don't skew the baseline.

Manual triggers

# Default: run the probes, print results, record nothing.
# First invocation opens a browser for Cloudflare Access login; subsequent
# invocations reuse the token until the Access session expires.
pnpm trigger

# Persist the run as source=manual (appears in the results table)
pnpm trigger -- --store --note "pre-cold-start-fix baseline"

# Attach a SHA and/or PR number to a persisted run
pnpm trigger -- --store --sha 1a2b3c4 --pr 532 --note "PR #532 preview"

Auth is handled by a Cloudflare Access policy on POST /api/trigger

First-time setup

# 1. Create the D1 database and apply the initial schema
wrangler d1 create emdash_perf
# copy the database_id into wrangler.jsonc

wrangler d1 execute emdash_perf --remote --file=schema.sql
pnpm db:migrations:apply  # any incremental migrations on top

# 2. Create the deploy events queue and DLQ
wrangler queues create emdash-perf-deploy-events
wrangler queues create emdash-perf-deploy-events-dlq

# 3. Build and deploy all 5 Workers
pnpm deploy

# 4. Subscribe the queue to Workers Builds events.
# (No wrangler command for this yet -- use the CF dashboard or API:
# https://developers.cloudflare.com/queues/event-subscriptions/manage-event-subscriptions/)
# Source: Workers Builds
# Events: build.succeeded (at minimum)
# Queue: emdash-perf-deploy-events

# 5. (Optional, to enable manual triggers) Add a Cloudflare Access policy
# on POST /api/trigger. See "Access setup" above.

No secrets required. PR lookup hits the public GitHub API unauthenticated (60 req/hr limit, plenty for one lookup per deploy).

Deploy order

The coordinator's service bindings require the probes to exist first. pnpm deploy handles this: it builds, deploys all 4 probes, then deploys the coordinator.

Dev

pnpm dev  # Vite dev server, all 5 Workers via Miniflare

Open http://localhost:5173 for the dashboard. API is at /api/*. Queue events can't be exercised locally without manual message publishing -- rely on the live environment or the next cron tick to verify the measurement path.

Local manual trigger (no Access locally):

curl -sS -X POST http://localhost:5173/api/trigger \
  -H 'content-type: application/json' \
  -d '{"note":"local test"}'

Endpoints

Endpoint Method Auth Purpose
/ GET none Dashboard
/api/config GET none Target URL, available routes and regions
/api/summary GET none Latest result per route/region + rolling medians
/api/results GET none Filtered historical results
/api/chart GET none Time series for charting (with PR markers)
/api/trigger POST CF Access Run an ad-hoc measurement, tagged source=manual

All GET endpoints are read-only. POST /api/trigger is the only state-changing endpoint and is expected to be protected by a Cloudflare Access policy at the edge.

Schema changes

D1's native migrations are wired up (migrations_dir in wrangler.jsonc).

pnpm db:migrations:list    # show pending migrations
pnpm db:migrations:apply   # apply pending migrations
pnpm db:migrations:create  # scaffold a new migration file

schema.sql is the desired end state for fresh installs only. For incremental changes on an existing database, add a file under migrations/ and apply it -- don't rely on editing schema.sql to take effect.

Types

Binding types come from wrangler types, which reads wrangler.jsonc and writes worker-configuration.d.ts. The generated file is committed so tsc doesn't need wrangler to run first.

Re-run after any binding change:

pnpm cf-typegen

Operational notes

  • Trigger worker name: TRIGGER_WORKER_NAME in src/routes.ts is the Worker whose build.succeeded event drives deploy-attributed runs. Events for any other Worker are discarded (the cron job still measures every site on its own schedule). Since every registered site rebuilds from the same main-branch commit, one event triggers a measurement for all of them. If the baseline demo is ever renamed, update this constant.
  • Adding a site: add an entry to SITES in src/routes.ts with a stable id (stored in perf_results.site), targetUrl, and Worker name. Existing rows continue to use their recorded site id.
  • PR lookup: hits the public GitHub API unauthenticated (60 req/hr per IP). One call per deploy, so rate limits are a non-issue. If deploy rate ever gets anywhere near that, add a fine-grained PAT via wrangler secret put GITHUB_TOKEN and pass it in src/github.ts.
  • DLQ: failed messages retry 3x, then go to emdash-perf-deploy-events-dlq. Check this periodically if deploy-attributed results stop appearing.