feat: initial public release

ConsentOS — a privacy-first cookie consent management platform.

Self-hosted, source-available alternative to OneTrust, Cookiebot, and
CookieYes. Full standards coverage (IAB TCF v2.2, GPP v1, Google
Consent Mode v2, GPC, Shopify Customer Privacy API), multi-tenant
architecture with role-based access, configuration cascade
(system → org → group → site → region), dark-pattern detection in
the scanner, and a tamper-evident consent record audit trail.

This is the initial public release. Prior development history is
retained internally.

See README.md for the feature list, architecture overview, and
quick-start instructions. Licensed under the Elastic Licence 2.0 —
self-host freely; do not resell as a managed service.
This commit is contained in:
James Cottrill
2026-04-13 14:20:15 +00:00
commit fbf26453f2
341 changed files with 62807 additions and 0 deletions

291
CLAUDE.md Normal file
View File

@@ -0,0 +1,291 @@
# ConsentOS
## Project Overview
ConsentOS is a multi-tenant cookie consent management platform — a source-available alternative to OneTrust, Cookiebot, and CookieYes — that provides cookie scanning, consent collection, auto-blocking, and compliance checking across many sites with per-site configuration.
The platform delivers a single `<script>` tag that site owners embed. This script handles consent collection, cookie blocking, IAB TCF v2.2, and Google Consent Mode v2 signalling. A separate admin dashboard allows site owners to manage configurations, review scan results, and check compliance.
**Public repo:** [github.com/consentos/consentos](https://github.com/consentos/consentos)
**Domain:** [consentos.dev](https://consentos.dev)
## Architecture Summary
```
CDN (static assets)
├── consent-loader.js (~2KB gzipped, sync bootstrap)
├── consent-bundle-{v}.js (~25KB gzipped, full banner + blocker)
├── site-config-{id}.json (cached site configuration)
└── translations-{locale}.json
Client Browser
├── Script Interceptor (MutationObserver + createElement override)
├── Cookie Blocker (document.cookie proxy, Storage proxy)
├── Banner UI (Shadow DOM, customisable, a11y-compliant)
├── TCF v2.2 API (__tcfapi)
├── Google Consent Mode v2 (gtag integration)
├── Client-side Cookie Reporter
└── Consent State Manager
API Layer (FastAPI)
├── Config API — site/org CRUD, banner config, allow-lists, CDN publishing
├── Consent API — consent recording, retrieval, TC string generation, analytics
├── Scanner API — scan management, client-side cookie reports
└── Admin BFF — aggregates the above for the admin UI
Scanner Service (Python + Playwright)
├── Scheduled headless browser crawls
├── Cookie discovery and script attribution
└── Auto-categorisation via known cookies DB
PostgreSQL — all persistent state
Redis — caching, rate limiting, Celery job queue
Admin UI (Vite + React + TypeScript)
├── Site management, configuration editor
├── Cookie manager, allow-list management
├── Banner builder (visual editor with live preview)
├── Compliance checker (GDPR, CNIL, CCPA, ePrivacy, LGPD)
└── Analytics dashboard (consent rates, trends, regional)
```
## Technology Stack
### Backend (`apps/api/`)
- **Language:** Python 3.12+
- **Framework:** FastAPI
- **ORM:** SQLAlchemy 2.0 (async)
- **Migrations:** Alembic
- **Database:** PostgreSQL 16
- **Cache/Queue:** Redis + Celery
- **Auth:** JWT (org-scoped, role-based)
- **Validation:** Pydantic v2
### Scanner (`apps/scanner/`)
- **Language:** Python 3.12+
- **Browser automation:** Playwright
- **Job scheduling:** Celery + Redis
### Banner Script (`apps/banner/`)
- **Language:** TypeScript
- **Build:** Rollup (outputs IIFE bundles)
- **UI isolation:** Shadow DOM
- **Standards:** IAB TCF v2.2, Google Consent Mode v2
### Admin UI (`apps/admin-ui/`)
- **Framework:** Vite + React + TypeScript
- **Primary UI:** shadcn/ui + TailwindCSS
- **Complex components:** MUI (DataGrid for tables, charts)
- **Server state:** TanStack Query
- **Client state:** Zustand
- **Routing:** React Router v6
- **Forms:** React Hook Form + Zod
- **i18n:** react-i18next
### Infrastructure
- **Containerisation:** Docker / Docker Compose
- **Orchestration:** Kubernetes (Helm chart)
- **CDN:** Cloud-agnostic (CloudFlare, Cloud CDN, or CloudFront)
## Project Structure
```
consent-platform/
├── apps/
│ ├── api/ # FastAPI backend
│ │ ├── src/
│ │ │ ├── config/ # Pydantic settings, environment
│ │ │ ├── models/ # SQLAlchemy models
│ │ │ ├── schemas/ # Pydantic request/response schemas
│ │ │ ├── routers/ # API route handlers
│ │ │ │ ├── config.py # site/org config endpoints
│ │ │ │ ├── consent.py # consent recording/retrieval
│ │ │ │ ├── scanner.py # scan management
│ │ │ │ ├── analytics.py # analytics endpoints
│ │ │ │ ├── compliance.py # compliance checker
│ │ │ │ └── auth.py # authentication
│ │ │ ├── services/ # Business logic
│ │ │ │ ├── consent.py
│ │ │ │ ├── tcf.py # TC string encoding/decoding
│ │ │ │ ├── gcm.py # Google Consent Mode logic
│ │ │ │ ├── compliance.py # Compliance rule engine
│ │ │ │ ├── publisher.py # CDN publishing
│ │ │ │ └── classification.py # Cookie auto-categorisation
│ │ │ ├── db/ # Database connection, session
│ │ │ └── main.py
│ │ ├── tests/
│ │ ├── alembic/
│ │ ├── pyproject.toml
│ │ └── Dockerfile
│ │
│ ├── scanner/ # Cookie scanner service
│ │ ├── src/
│ │ │ ├── crawler.py # Playwright-based crawler
│ │ │ ├── classifier.py # Cookie classification
│ │ │ ├── scheduler.py # Scan job scheduling
│ │ │ └── worker.py # Celery worker
│ │ ├── Dockerfile
│ │ └── pyproject.toml
│ │
│ ├── admin-ui/ # Vite + React + TS admin dashboard
│ │ ├── src/
│ │ │ ├── components/
│ │ │ ├── pages/
│ │ │ ├── hooks/
│ │ │ ├── api/ # TanStack Query hooks
│ │ │ ├── stores/ # Zustand stores
│ │ │ └── i18n/
│ │ ├── package.json
│ │ ├── vite.config.ts
│ │ ├── tsconfig.json
│ │ └── tailwind.config.ts
│ │
│ └── banner/ # Client-side consent banner
│ ├── src/
│ │ ├── loader.ts # Lightweight bootstrap (~2KB)
│ │ ├── banner.ts # Banner UI engine
│ │ ├── blocker.ts # Script/cookie interceptor
│ │ ├── tcf.ts # TCF v2.2 API implementation
│ │ ├── gcm.ts # Google Consent Mode v2
│ │ ├── reporter.ts # Client-side cookie reporter
│ │ ├── consent.ts # Consent state management
│ │ ├── i18n.ts # Translation loader
│ │ ├── a11y.ts # Accessibility utilities
│ │ └── types.ts
│ ├── rollup.config.js
│ ├── package.json
│ └── tsconfig.json
├── packages/
│ └── shared/ # Shared types, constants, utils
├── helm/consentos/ # Kubernetes deployment
├── docker-compose.yml
├── Makefile
└── README.md
```
## Key Data Entities
- **organisations** — multi-tenant root, each org has multiple sites
- **users** — org-scoped with roles: owner, admin, editor, viewer
- **sites** — a domain being managed (e.g. example.com), belongs to an org
- **site_configs** — full configuration per site: blocking mode, TCF settings, GCM defaults, banner config JSON, scan schedule, consent expiry
- **cookie_categories** — taxonomy (necessary, functional, analytics, marketing, personalisation) with TCF purpose and GCM consent type mappings
- **cookies** — discovered cookies per site with metadata, vendor, category, review status
- **cookie_allow_list** — approved cookies per site with category assignment
- **known_cookies** — shared knowledge base of known cookie patterns for auto-categorisation
- **consent_records** — audit trail of every consent event (partitioned by month)
- **scan_jobs** / **scan_results** — scanning pipeline state and results
- **translations** — i18n strings per site per locale
## Configuration Hierarchy
Configuration resolves in this order (each level overrides the previous):
```
System Defaults (code) → Organisation Defaults → Site Config → Regional Overrides
```
The `site_configs.regional_modes` JSONB field allows per-region blocking mode:
```json
{"EU": "opt_in", "GB": "opt_in", "US-CA": "opt_out", "BR": "opt_in", "DEFAULT": "opt_in"}
```
## Consent Flow
1. Site loads `consent-loader.js` (sync, before other scripts)
2. Loader reads existing consent cookie — if valid, applies consent state and exits
3. If no consent: installs script interceptor, blocks non-essential scripts/cookies
4. Sets Google Consent Mode defaults (`gtag('consent', 'default', {...})`)
5. Installs `__tcfapi` stub for TCF v2.2
6. Async-loads full banner bundle + site config from CDN
7. Banner displays; user interacts
8. On consent action: generates TC string, sets first-party cookie, calls `gtag('consent', 'update', {...})`, releases blocked scripts by category
9. POSTs consent record to Consent API for server-side audit storage
10. Fires `consent-change` custom event + dataLayer push for GTM
## Banner Script Architecture
The banner is split into two files for performance:
- **consent-loader.js** (~2KB gzipped) — synchronous critical path: consent cookie read, GCM defaults, TCF stub, script interceptor installation, async bundle load
- **consent-bundle-{version}.js** (~25KB gzipped) — full UI, consent engine, TCF encoder, reporter
The banner UI renders inside **Shadow DOM** for complete style isolation from the host site.
**Display modes:** overlay (full-screen modal), bottom_banner, top_banner, corner_popup, inline (into specific DOM element)
**Auto-blocking works by:**
- Overriding `document.createElement` to intercept `<script>` tag creation
- `MutationObserver` on `<head>` and `<body>` for dynamically inserted scripts
- Proxying `document.cookie` setter to block writes from non-essential categories
- Wrapping `localStorage.setItem` and `sessionStorage.setItem`
- Maintaining a queue of blocked scripts, released per-category when consent is granted
## Compliance Frameworks
The compliance engine is rule-based. Each framework is a set of `ComplianceRule` objects:
- **GDPR** — opt-in, reject = accept prominence, granular consent, proof of consent, no cookie walls, no pre-ticked boxes
- **CNIL** — all GDPR rules plus: Tout refuser on first layer, max 13-month cookie lifetime, max 6-month consent retention, re-consent every 6 months
- **CCPA/CPRA** — opt-out model, Do Not Sell link, honour GPC signal, under-16 opt-in
- **ePrivacy** — consent for non-essential, strictly necessary exempt
- **LGPD** — consent or legitimate interest basis, identify data controller
Rules output: severity (critical/warning/info), message, recommendation. Aggregated into per-framework scores.
## Coding Conventions
- **Language:** British English throughout (code comments, UI strings, documentation)
- **Python:** Use `pyproject.toml`, type hints everywhere, async where possible
- **SQL:** CTEs over subqueries, no `SELECT *`, explicit column lists
- **TypeScript:** strict mode, explicit return types on exported functions
- **Git:** conventional commits (`feat:`, `fix:`, `chore:`, `docs:`)
- **Testing:** pytest for Python, Vitest for TypeScript, aim for >80% coverage on services
- **API design:** RESTful, Pydantic schemas for all request/response bodies, consistent error format
- **Database:** UUID primary keys, `created_at`/`updated_at` timestamps on all tables, soft deletes where appropriate
## Development Environment
```bash
# Start everything
docker compose up -d
# Run migrations
make migrate
# Seed default data (cookie categories, known cookies)
make seed
# Run tests
make test
# Lint
make lint
```
Services in Docker Compose:
- `api` — FastAPI on port 8000
- `scanner` — Playwright scanner service
- `postgres` — PostgreSQL 16 on port 5432
- `redis` — Redis on port 6379
- `admin-ui` — Vite dev server on port 5173 (also dog-foods the banner)
## Implementation Phases
| Phase | Scope |
|-------|-------|
| 1 (Weeks 13) | DB schema, FastAPI scaffold, auth, site CRUD, basic banner, consent API, Docker Compose |
| 2 (Weeks 46) | TCF v2.2, Google Consent Mode v2, script interceptor/auto-blocking, cookie categories, allow-list, config hierarchy, admin UI scaffold |
| 3 (Weeks 78) | Playwright crawler, auto-categorisation, client-side reporter, scan scheduling, admin UI for scans |
| 4 (Weeks 910) | Compliance rule engine (GDPR/CNIL/CCPA/ePrivacy/LGPD), consent analytics API, compliance + analytics admin UI |
| 5 (Weeks 1112) | Banner builder (visual editor), all display modes, full i18n, a11y audit, GeoIP, multi-domain, Helm chart, security hardening, load testing |
## Key External Standards
- **IAB TCF v2.2:** [IAB TCF Technical Specification](https://github.com/InteractiveAdvertisingBureau/GDPR-Transparency-and-Consent-Framework/blob/master/TCFv2/IAB%20Tech%20Lab%20-%20Consent%20string%20and%20vendor%20list%20formats%20v2.md)
- **Google Consent Mode v2:** [Google Developer Docs](https://developers.google.com/tag-platform/security/guides/consent)
- **Global Vendor List (GVL):** Loaded from IAB, cached, updated regularly
- **WCAG 2.1 AA:** Accessibility target for the banner UI