Files
consentos/CLAUDE.md
James Cottrill fbf26453f2 feat: initial public release
ConsentOS — a privacy-first cookie consent management platform.

Self-hosted, source-available alternative to OneTrust, Cookiebot, and
CookieYes. Full standards coverage (IAB TCF v2.2, GPP v1, Google
Consent Mode v2, GPC, Shopify Customer Privacy API), multi-tenant
architecture with role-based access, configuration cascade
(system → org → group → site → region), dark-pattern detection in
the scanner, and a tamper-evident consent record audit trail.

This is the initial public release. Prior development history is
retained internally.

See README.md for the feature list, architecture overview, and
quick-start instructions. Licensed under the Elastic Licence 2.0 —
self-host freely; do not resell as a managed service.
2026-04-14 09:18:18 +00:00

292 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ConsentOS
## Project Overview
ConsentOS is a multi-tenant cookie consent management platform — a source-available alternative to OneTrust, Cookiebot, and CookieYes — that provides cookie scanning, consent collection, auto-blocking, and compliance checking across many sites with per-site configuration.
The platform delivers a single `<script>` tag that site owners embed. This script handles consent collection, cookie blocking, IAB TCF v2.2, and Google Consent Mode v2 signalling. A separate admin dashboard allows site owners to manage configurations, review scan results, and check compliance.
**Public repo:** [github.com/consentos/consentos](https://github.com/consentos/consentos)
**Domain:** [consentos.dev](https://consentos.dev)
## Architecture Summary
```
CDN (static assets)
├── consent-loader.js (~2KB gzipped, sync bootstrap)
├── consent-bundle-{v}.js (~25KB gzipped, full banner + blocker)
├── site-config-{id}.json (cached site configuration)
└── translations-{locale}.json
Client Browser
├── Script Interceptor (MutationObserver + createElement override)
├── Cookie Blocker (document.cookie proxy, Storage proxy)
├── Banner UI (Shadow DOM, customisable, a11y-compliant)
├── TCF v2.2 API (__tcfapi)
├── Google Consent Mode v2 (gtag integration)
├── Client-side Cookie Reporter
└── Consent State Manager
API Layer (FastAPI)
├── Config API — site/org CRUD, banner config, allow-lists, CDN publishing
├── Consent API — consent recording, retrieval, TC string generation, analytics
├── Scanner API — scan management, client-side cookie reports
└── Admin BFF — aggregates the above for the admin UI
Scanner Service (Python + Playwright)
├── Scheduled headless browser crawls
├── Cookie discovery and script attribution
└── Auto-categorisation via known cookies DB
PostgreSQL — all persistent state
Redis — caching, rate limiting, Celery job queue
Admin UI (Vite + React + TypeScript)
├── Site management, configuration editor
├── Cookie manager, allow-list management
├── Banner builder (visual editor with live preview)
├── Compliance checker (GDPR, CNIL, CCPA, ePrivacy, LGPD)
└── Analytics dashboard (consent rates, trends, regional)
```
## Technology Stack
### Backend (`apps/api/`)
- **Language:** Python 3.12+
- **Framework:** FastAPI
- **ORM:** SQLAlchemy 2.0 (async)
- **Migrations:** Alembic
- **Database:** PostgreSQL 16
- **Cache/Queue:** Redis + Celery
- **Auth:** JWT (org-scoped, role-based)
- **Validation:** Pydantic v2
### Scanner (`apps/scanner/`)
- **Language:** Python 3.12+
- **Browser automation:** Playwright
- **Job scheduling:** Celery + Redis
### Banner Script (`apps/banner/`)
- **Language:** TypeScript
- **Build:** Rollup (outputs IIFE bundles)
- **UI isolation:** Shadow DOM
- **Standards:** IAB TCF v2.2, Google Consent Mode v2
### Admin UI (`apps/admin-ui/`)
- **Framework:** Vite + React + TypeScript
- **Primary UI:** shadcn/ui + TailwindCSS
- **Complex components:** MUI (DataGrid for tables, charts)
- **Server state:** TanStack Query
- **Client state:** Zustand
- **Routing:** React Router v6
- **Forms:** React Hook Form + Zod
- **i18n:** react-i18next
### Infrastructure
- **Containerisation:** Docker / Docker Compose
- **Orchestration:** Kubernetes (Helm chart)
- **CDN:** Cloud-agnostic (CloudFlare, Cloud CDN, or CloudFront)
## Project Structure
```
consent-platform/
├── apps/
│ ├── api/ # FastAPI backend
│ │ ├── src/
│ │ │ ├── config/ # Pydantic settings, environment
│ │ │ ├── models/ # SQLAlchemy models
│ │ │ ├── schemas/ # Pydantic request/response schemas
│ │ │ ├── routers/ # API route handlers
│ │ │ │ ├── config.py # site/org config endpoints
│ │ │ │ ├── consent.py # consent recording/retrieval
│ │ │ │ ├── scanner.py # scan management
│ │ │ │ ├── analytics.py # analytics endpoints
│ │ │ │ ├── compliance.py # compliance checker
│ │ │ │ └── auth.py # authentication
│ │ │ ├── services/ # Business logic
│ │ │ │ ├── consent.py
│ │ │ │ ├── tcf.py # TC string encoding/decoding
│ │ │ │ ├── gcm.py # Google Consent Mode logic
│ │ │ │ ├── compliance.py # Compliance rule engine
│ │ │ │ ├── publisher.py # CDN publishing
│ │ │ │ └── classification.py # Cookie auto-categorisation
│ │ │ ├── db/ # Database connection, session
│ │ │ └── main.py
│ │ ├── tests/
│ │ ├── alembic/
│ │ ├── pyproject.toml
│ │ └── Dockerfile
│ │
│ ├── scanner/ # Cookie scanner service
│ │ ├── src/
│ │ │ ├── crawler.py # Playwright-based crawler
│ │ │ ├── classifier.py # Cookie classification
│ │ │ ├── scheduler.py # Scan job scheduling
│ │ │ └── worker.py # Celery worker
│ │ ├── Dockerfile
│ │ └── pyproject.toml
│ │
│ ├── admin-ui/ # Vite + React + TS admin dashboard
│ │ ├── src/
│ │ │ ├── components/
│ │ │ ├── pages/
│ │ │ ├── hooks/
│ │ │ ├── api/ # TanStack Query hooks
│ │ │ ├── stores/ # Zustand stores
│ │ │ └── i18n/
│ │ ├── package.json
│ │ ├── vite.config.ts
│ │ ├── tsconfig.json
│ │ └── tailwind.config.ts
│ │
│ └── banner/ # Client-side consent banner
│ ├── src/
│ │ ├── loader.ts # Lightweight bootstrap (~2KB)
│ │ ├── banner.ts # Banner UI engine
│ │ ├── blocker.ts # Script/cookie interceptor
│ │ ├── tcf.ts # TCF v2.2 API implementation
│ │ ├── gcm.ts # Google Consent Mode v2
│ │ ├── reporter.ts # Client-side cookie reporter
│ │ ├── consent.ts # Consent state management
│ │ ├── i18n.ts # Translation loader
│ │ ├── a11y.ts # Accessibility utilities
│ │ └── types.ts
│ ├── rollup.config.js
│ ├── package.json
│ └── tsconfig.json
├── packages/
│ └── shared/ # Shared types, constants, utils
├── helm/consentos/ # Kubernetes deployment
├── docker-compose.yml
├── Makefile
└── README.md
```
## Key Data Entities
- **organisations** — multi-tenant root, each org has multiple sites
- **users** — org-scoped with roles: owner, admin, editor, viewer
- **sites** — a domain being managed (e.g. example.com), belongs to an org
- **site_configs** — full configuration per site: blocking mode, TCF settings, GCM defaults, banner config JSON, scan schedule, consent expiry
- **cookie_categories** — taxonomy (necessary, functional, analytics, marketing, personalisation) with TCF purpose and GCM consent type mappings
- **cookies** — discovered cookies per site with metadata, vendor, category, review status
- **cookie_allow_list** — approved cookies per site with category assignment
- **known_cookies** — shared knowledge base of known cookie patterns for auto-categorisation
- **consent_records** — audit trail of every consent event (partitioned by month)
- **scan_jobs** / **scan_results** — scanning pipeline state and results
- **translations** — i18n strings per site per locale
## Configuration Hierarchy
Configuration resolves in this order (each level overrides the previous):
```
System Defaults (code) → Organisation Defaults → Site Config → Regional Overrides
```
The `site_configs.regional_modes` JSONB field allows per-region blocking mode:
```json
{"EU": "opt_in", "GB": "opt_in", "US-CA": "opt_out", "BR": "opt_in", "DEFAULT": "opt_in"}
```
## Consent Flow
1. Site loads `consent-loader.js` (sync, before other scripts)
2. Loader reads existing consent cookie — if valid, applies consent state and exits
3. If no consent: installs script interceptor, blocks non-essential scripts/cookies
4. Sets Google Consent Mode defaults (`gtag('consent', 'default', {...})`)
5. Installs `__tcfapi` stub for TCF v2.2
6. Async-loads full banner bundle + site config from CDN
7. Banner displays; user interacts
8. On consent action: generates TC string, sets first-party cookie, calls `gtag('consent', 'update', {...})`, releases blocked scripts by category
9. POSTs consent record to Consent API for server-side audit storage
10. Fires `consent-change` custom event + dataLayer push for GTM
## Banner Script Architecture
The banner is split into two files for performance:
- **consent-loader.js** (~2KB gzipped) — synchronous critical path: consent cookie read, GCM defaults, TCF stub, script interceptor installation, async bundle load
- **consent-bundle-{version}.js** (~25KB gzipped) — full UI, consent engine, TCF encoder, reporter
The banner UI renders inside **Shadow DOM** for complete style isolation from the host site.
**Display modes:** overlay (full-screen modal), bottom_banner, top_banner, corner_popup, inline (into specific DOM element)
**Auto-blocking works by:**
- Overriding `document.createElement` to intercept `<script>` tag creation
- `MutationObserver` on `<head>` and `<body>` for dynamically inserted scripts
- Proxying `document.cookie` setter to block writes from non-essential categories
- Wrapping `localStorage.setItem` and `sessionStorage.setItem`
- Maintaining a queue of blocked scripts, released per-category when consent is granted
## Compliance Frameworks
The compliance engine is rule-based. Each framework is a set of `ComplianceRule` objects:
- **GDPR** — opt-in, reject = accept prominence, granular consent, proof of consent, no cookie walls, no pre-ticked boxes
- **CNIL** — all GDPR rules plus: Tout refuser on first layer, max 13-month cookie lifetime, max 6-month consent retention, re-consent every 6 months
- **CCPA/CPRA** — opt-out model, Do Not Sell link, honour GPC signal, under-16 opt-in
- **ePrivacy** — consent for non-essential, strictly necessary exempt
- **LGPD** — consent or legitimate interest basis, identify data controller
Rules output: severity (critical/warning/info), message, recommendation. Aggregated into per-framework scores.
## Coding Conventions
- **Language:** British English throughout (code comments, UI strings, documentation)
- **Python:** Use `pyproject.toml`, type hints everywhere, async where possible
- **SQL:** CTEs over subqueries, no `SELECT *`, explicit column lists
- **TypeScript:** strict mode, explicit return types on exported functions
- **Git:** conventional commits (`feat:`, `fix:`, `chore:`, `docs:`)
- **Testing:** pytest for Python, Vitest for TypeScript, aim for >80% coverage on services
- **API design:** RESTful, Pydantic schemas for all request/response bodies, consistent error format
- **Database:** UUID primary keys, `created_at`/`updated_at` timestamps on all tables, soft deletes where appropriate
## Development Environment
```bash
# Start everything
docker compose up -d
# Run migrations
make migrate
# Seed default data (cookie categories, known cookies)
make seed
# Run tests
make test
# Lint
make lint
```
Services in Docker Compose:
- `api` — FastAPI on port 8000
- `scanner` — Playwright scanner service
- `postgres` — PostgreSQL 16 on port 5432
- `redis` — Redis on port 6379
- `admin-ui` — Vite dev server on port 5173 (also dog-foods the banner)
## Implementation Phases
| Phase | Scope |
|-------|-------|
| 1 (Weeks 13) | DB schema, FastAPI scaffold, auth, site CRUD, basic banner, consent API, Docker Compose |
| 2 (Weeks 46) | TCF v2.2, Google Consent Mode v2, script interceptor/auto-blocking, cookie categories, allow-list, config hierarchy, admin UI scaffold |
| 3 (Weeks 78) | Playwright crawler, auto-categorisation, client-side reporter, scan scheduling, admin UI for scans |
| 4 (Weeks 910) | Compliance rule engine (GDPR/CNIL/CCPA/ePrivacy/LGPD), consent analytics API, compliance + analytics admin UI |
| 5 (Weeks 1112) | Banner builder (visual editor), all display modes, full i18n, a11y audit, GeoIP, multi-domain, Helm chart, security hardening, load testing |
## Key External Standards
- **IAB TCF v2.2:** [IAB TCF Technical Specification](https://github.com/InteractiveAdvertisingBureau/GDPR-Transparency-and-Consent-Framework/blob/master/TCFv2/IAB%20Tech%20Lab%20-%20Consent%20string%20and%20vendor%20list%20formats%20v2.md)
- **Google Consent Mode v2:** [Google Developer Docs](https://developers.google.com/tag-platform/security/guides/consent)
- **Global Vendor List (GVL):** Loaded from IAB, cached, updated regularly
- **WCAG 2.1 AA:** Accessibility target for the banner UI